4

I have quite a complex If statement that I would like to add as a column in my pandas dataframe. In the past I've always used numpy.select for this type of problem, however I wouldn't know how to achieve that with a multi-line if statement.

I was able to get this in Excel:

=IF(sum1=3,IF(AND(col1=col2,col2=col3),0,1),IF(sum1=2,IF(OR(col1=col2,col2=col3,col1=col3),0,1),IF(sum1=1,0,1)))

and write it in Python just as a regular multi-line 'if statement', just want to find out if there is a far cleaner way of presenting this.

if df['sum1'] == 3:
  if df['col1'] == df['col2'] and df['col2'] == df['col3']:
    df['verify_col'] = 0
  else:
    df['verify_col'] = 1
elif df['sum1'] == 2:
  if df['col1'] == df['col2'] or df['col2'] == df['col3'] or df['col1'] == df['col3']:
    df['verify_col'] = 0
  else:
    df['verify_col'] = 1
elif df['sum1'] == 1:
  df['verify_col'] = 0
else:
  df['verify_col'] = 1

Here's some sample data:

df = pd.DataFrame({
    'col1': ['BMW', 'Mercedes Benz', 'Lamborghini', 'Ferrari', null],
    'col2': ['BMW', 'Mercedes Benz', null, null, 'Tesla'],
    'col3': ['BMW', 'Mercedes', 'Lamborghini', null, 'Tesla_'],
    'sum1': [3, 3, 2, 1, 2]
})

I want a column which has the following results:

'verify_col': [0, 1, 0, 0, 1]

It basically checks whether the columns match for those that have values in them and assigns a 1 or a 0 for each row. 1 meaning they are different, 0 meaning zero difference.

1
  • I would be tempted to use .apply for this. Are you familiar with that? It calls a function that you provide on every row, and the result can be assigned to a new column. Commented Oct 25, 2022 at 5:26

3 Answers 3

4

Use numpy.where with chain mask with | for bitwise OR - if no match any conditions is created 1:

m1 = (df['sum1'] == 3)
m2 = (df['col1'] == df['col2']) & (df['col2'] == df['col3'])
m3 = (df['sum1'] == 2)
m4 = (df['col1'] == df['col2']) | (df['col2'] == df['col3']) | (df['col1'] == df['col3'])
m5 = df['sum1'] == 1

df['verify_col'] = np.where((m1 & m2) | (m3 & m4) | m5, 0, 1)

If need None if no match any conditions:

df['verify_col'] = np.select([(m1 & m2) | (m3 & m4) | m5,
                              (m1 & ~m2) | (m3 & ~m4) | ~m5], 
                             [0,1], default=None)


print (df)
            col1           col2         col3  sum1  verify_col
0            BMW            BMW          BMW     3           0
1  Mercedes Benz  Mercedes Benz     Mercedes     3           1
2    Lamborghini            NaN  Lamborghini     2           0
3        Ferrari            NaN          NaN     1           0
4            NaN          Tesla       Tesla_     2           1
Sign up to request clarification or add additional context in comments.

1 Comment

@DeepakTripathi - I have typo, now correct
1

One option is with case_when from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.case_when(
# condition, result
df.sum1.eq(3) & df.col1.eq(df.col2) & df.col2.eq(df.col3), 0,
df.sum1.eq(3), 1,
df.sum1.eq(2) & (df.col1.eq(df.col2) | df.col2.eq(df.col3) | df.col1.eq(df.col3)), 0,
df.sum1.eq(2), 1,
df.sum1.eq(1), 0,
1, # default
column_name='verify_col')
)

            col1           col2         col3  sum1  verify_col
0            BMW            BMW          BMW     3           0
1  Mercedes Benz  Mercedes Benz     Mercedes     3           1
2    Lamborghini           None  Lamborghini     2           0
3        Ferrari           None         None     1           0
4           None          Tesla       Tesla_     2           1

Of course, you can do this with np.select:

conditions = [df.sum1.eq(3) & df.col1.eq(df.col2) & df.col2.eq(df.col3), 
              df.sum1.eq(3), 
              df.sum1.eq(2) & (df.col1.eq(df.col2) | df.col2.eq(df.col3) | 
              df.col1.eq(df.col3)), 
              df.sum1.eq(2), 
              df.sum1.eq(1)]

results = [0,1,0,1,0]

outcome = np.select(conditions, results, default=1)
df.assign(verify_col = outcome)

            col1           col2         col3  sum1  verify_col
0            BMW            BMW          BMW     3           0
1  Mercedes Benz  Mercedes Benz     Mercedes     3           1
2    Lamborghini           None  Lamborghini     2           0
3        Ferrari           None         None     1           0
4           None          Tesla       Tesla_     2           1

Comments

0
df['verify_col'] = (~(((df["col1"] == df["col2"]) | df["col1"].isna() | df["col2"].isna()) & ((df["col2"] == df["col3"]) | df["col2"].isna() | df["col3"].isna()))).astype(int)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.