4

I have a pandas dataframe like this:

aa bb cc dd ee
a  a  b  b  foo
a  b  a  a  foo
b  a  a  a  bar
b  b  b  b  bar

I want to add a new column if value in columns 1 to 4 is a

The results would be like this:

aa bb cc dd ee  ff
a  a  b  b  foo a
a  b  a  a  foo a
b  a  a  a  bar a
b  b  b  b  bar b

The logic is: if value in any of columns 1 to 4 is a then column ff is a else it's b

I can define a function and do each column manually like:

def some_function(row);
   if row['aa']=='a' or row['bb']=='a' or row['cc']=='a' or row[dd]=='a':
       return 'a'
   return 'b'

But I'm looking for a solution that can scale across n number of columns.

Appreciate any help!

1
  • Perhaps you can just use df.iloc[:,:4].min(1)? Commented Oct 18, 2017 at 11:33

1 Answer 1

4

Use numpy.where with condition created by eq (==) with any for check at least one True per row:

cols = ['aa','bb','cc', 'dd']
df['ff'] = np.where(df[cols].eq('a').any(1), 'a', 'b')
print (df)
  aa bb cc dd   ee ff
0  a  a  b  b  foo  a
1  a  b  a  a  foo  a
2  b  a  a  a  bar  a
3  b  b  b  b  bar  b

Detail:

print (df[cols].eq('a'))
      aa     bb     cc
0   True   True  False
1   True  False   True
2  False   True   True
3  False  False  False

print (df[cols].eq('a').any(1))
0     True
1     True
2     True
3    False
dtype: bool

If need custom function:

def some_function(row):
   if row[cols].eq('a').any():
       return 'a'
   return 'b'

df['ff'] = df.apply(some_function, 1)
print (df)
  aa bb cc dd   ee ff
0  a  a  b  b  foo  a
1  a  b  a  a  foo  a
2  b  a  a  a  bar  a
3  b  b  b  b  bar  b
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.