64

I have a dataframe df like this:

      A      B       C            D
1   blue    red    square        NaN
2  orange  yellow  circle        NaN
3  black   grey    circle        NaN

and I want to update column D when it meets 3 conditions. Ex:

df.ix[ np.logical_and(df.A=='blue', df.B=='red', df.C=='square'), ['D'] ] = 'succeed'

It works for the first two conditions, but it doesn't work for the third, thus:

df.ix[ np.logical_and(df.A=='blue', df.B=='red', df.C=='triangle'), ['D'] ] = 'succeed'

has exactly the same result:

      A      B       C            D
1   blue    red    square        succeed
2  orange  yellow  circle        NaN
3  black   grey    circle        NaN
1
  • Use the solution in this answer if the other solutions are slow. Commented Nov 16, 2022 at 20:06

6 Answers 6

81

Using:

df[ (df.A=='blue') & (df.B=='red') & (df.C=='square') ]['D'] = 'succeed'

gives the warning:

/usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

A better way of achieving this seems to be:

df.loc[(df['A'] == 'blue') & (df['B'] == 'red') & (df['C'] == 'square'),'D'] = 'M5'
Sign up to request clarification or add additional context in comments.

Comments

28

You could try this instead:

df[ (df.A=='blue') & (df.B=='red') & (df.C=='square') ]['D'] = 'succeed'

Comments

6

You could try:

df['D'] = np.where((df.A=='blue') & (df.B=='red') & (df.C=='square'), 'succeed')

This answer might provide a detailed answer to the your question: Update row values where certain condition is met in pandas

Comments

4

This format might have been implied in the new answers, but the following bit actually worked for me.

df['D'].loc[(df['A'] == 'blue') & (df['B'] == 'red') & (df['C'] == 'square')] = 'succeed'

Comments

3

The third parameter of logical_and is to assign the array used to store the result.

Currently, the method @TimRich provided might be the best. In pandas 0.13 (in development), there's a new experimental query method. Try it!

Comments

0

The existing solutions are very slow for large dataframes (100k+ rows); an alternative is to try numexpr evaluation with eval() method to build a boolean mask and use this mask to replace values using mask() method.

df['D'] = df['D'].mask(df.eval("A=='blue' and B=='red' and C=='square'"), 'succeed')

As the length of the dataframe increases eval() becomes much faster than the other alternatives. For example, for a frame with 1mil rows, it is 2.2 times faster than loc method outlined in Praveen, Tim and Alex Schwab's answers.

Yet another method is to use numpy.where() method to select values according to a condition.

df['D'] = np.where((df.A=='blue') & (df.B=='red') & (df.C=='square'), 'succeed', pd.NA)

This is similar to theSanjeev's answer; the only difference is to set the "else"-value that is missing in their answer. Speaking of missing values, pd.NA is also faster than np.nan or float('nan') as well.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.