4

I have a df:

  col1 col2 col3
0    1    2    3
1    2    3    1
2    3    3    3
3    4    3    2

I want to add a new column based on the following conditions:

 - if   col1 > col2 > col3   ----->  2
 - elif col1 > col2          ----->  1
 - elif col1 < col2 < col3   -----> -2
 - elif col1 < col2          -----> -1
 - else                      ----->  0

And it should become this:

  col1 col2 col3   new
0    1    2    3   -2
1    2    3    1   -1
2    3    3    3    0
3    4    3    2    2

I followed the method from this post by unutbu, with 1 greater than or less than is fine. But in my case with more than 1 greater than or less than, conditions returns error:

conditions = [
       (df['col1'] > df['col2'] > df['col3']), 
       (df['col1'] > df['col2']),
       (df['col1'] < df['col2'] < df['col3']),
       (df['col1'] < df['col2'])]
choices = [2,1,-2,-1]
df['new'] = np.select(conditions, choices, default=0)


Traceback (most recent call last):

  File "<ipython-input-43-768a4c0ecf9f>", line 2, in <module>
    (df['col1'] > df['col2'] > df['col3']),

  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1478, in __nonzero__
    .format(self.__class__.__name__))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How should I do this?

1
  • 1
    try using '.eval' method instead to create a single boolean series, e.g., df.eval('col1 > col2 > col3'). This is equivalent to assigning True if, element by element, col1 is greater than col2 and col2 is greater than col3. Otherwise, pandas creates a boolean series for df['col1'] > df['col2'] and then it doesn't know how to evaluate the condition boolean series > df['col3'] Commented Jun 23, 2020 at 2:09

2 Answers 2

3

Change your code to

conditions = [
       (df['col1'] > df['col2']) &  (df['col2'] > df['col3']), 
       (df['col1'] > df['col2']),
       (df['col1'] < df['col2']) & (df['col2'] < df['col3']),
       (df['col1'] < df['col2'])]
choices = [2,1,-2,-1]
df['new'] = np.select(conditions, choices, default=0)
Sign up to request clarification or add additional context in comments.

Comments

0

One option is with case_when from pyjanitor; under the hood it uses pd.Series.mask.

The basic idea is a pairing of condition and expected value; you can pass as many pairings as required, followed by a default value and a target column name:

# pip install pyjanitor
import pandas as pd
import janitor

df.case_when( 
      # condition, value
     'col1>col2>col3', 2,
     'col1>col2', 1,
     'col1<col2<col3', -2,
     'col1<col2', -1,
     0, # default
     column_name = 'new')

   col1  col2  col3  new
0     1     2     3   -2
1     2     3     1   -1
2     3     3     3    0
3     4     3     2    2

The code above uses strings for the conditions, which are evaluated by pd.eval on the parent dataframe - note that speed wise, this can be slower for small datasets. A faster option (depending on the data size) would be to avoid the pd.eval option:

df.case_when( 
      # condition, value
     df.col1.gt(df.col2) & df.col2.gt(df.col3), 2,
     df.col1.gt(df.col2), 1,
     df.col1.lt(df.col2) & df.col2.lt(df.col3), -2,
     df.col1.lt(df.col2), -1,
     0, # default
     column_name = 'new')

   col1  col2  col3  new
0     1     2     3   -2
1     2     3     1   -1
2     3     3     3    0
3     4     3     2    2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.