Python Pandas .where with more than 2 possible condition inputs

Question

I am trying to use the .where pandas dataframe method, only I have more than 2 possibilities (ie i have if, elif, else instead of the default behavior if else)

Please consider the following dataframe:

a1 = np.random.rand(7,2)
a2 = np.random.randint(0,3,(7,1))
grid = np.append(a1, a2, axis=1)
df = pd.DataFrame(grid)

I tried

def test(x):
    if x[2] == 0:
        return 5
    if x[2]==1:
        return 10
    if x[2] ==2:
        return 50

df.where(test)

But I receive error message "truth value of a serie is ambiguous". I suspect this is the right direction but I am confused on how to achieve it. The documentation says that if the condition is a callable, the input is considered to be the full df. However even then it seems that it consider x[2] as the entire column 2. Is there no way to achieve a vectorized operation for that task? Is it only possible to iterate row by row, whether with iterrows or apply?

This is a toy example to be clear on the forum, I am not trying to do a simple .map in my real life problem. Please keep the "test" function as a separate function that needs to be passed if you answer, as this is where my difficulty lies.

jezrael · Accepted Answer · 2017-04-19 12:27:29Z

2

np.random.seed(100)
a1 = np.random.rand(7,2)
a2 = np.random.randint(0,3,(7,1))
grid = np.append(a1, a2, axis=1)
df = pd.DataFrame(grid)
print (df)
          0         1    2
0  0.543405  0.278369  2.0
1  0.424518  0.844776  2.0
2  0.004719  0.121569  0.0
3  0.670749  0.825853  0.0
4  0.136707  0.575093  1.0
5  0.891322  0.209202  1.0
6  0.185328  0.108377  1.0

Solution with map:

d = {0:5,1:10,2:50}
df['d'] = df[2].map(d)
print (df)
          0         1    2   d
0  0.543405  0.278369  2.0  50
1  0.424518  0.844776  2.0  50
2  0.004719  0.121569  0.0   5
3  0.670749  0.825853  0.0   5
4  0.136707  0.575093  1.0  10
5  0.891322  0.209202  1.0  10
6  0.185328  0.108377  1.0  10

Another solution with numpy.where:

df['d'] = np.where(df[2] == 0, 5, 
          np.where(df[2]== 1, 10,  50))

print (df)
          0         1    2   d
0  0.543405  0.278369  2.0  50
1  0.424518  0.844776  2.0  50
2  0.004719  0.121569  0.0   5
3  0.670749  0.825853  0.0   5
4  0.136707  0.575093  1.0  10
5  0.891322  0.209202  1.0  10
6  0.185328  0.108377  1.0  10

EDIT:

For separate function is possible use apply with parameter axis=1 for processing df by rows:

def test(x):
    #print (x)
    if x[2] == 0:
        return 5
    if x[2]==1:
        return 10
    if x[2] ==2:
        return 50

df['d'] = df.apply(test, axis=1)
print (df)
          0         1    2   d
0  0.543405  0.278369  2.0  50
1  0.424518  0.844776  2.0  50
2  0.004719  0.121569  0.0   5
3  0.670749  0.825853  0.0   5
4  0.136707  0.575093  1.0  10
5  0.891322  0.209202  1.0  10
6  0.185328  0.108377  1.0  10

But if need function:

def test(x):
    return np.where(x == 0, 5, np.where(x== 1, 10,  50))

print (test(df[2]))
[50 50  5  5 10 10 10]

edited Apr 19, 2017 at 12:27

answered Apr 19, 2017 at 12:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jim jarnac Over a year ago

HI, thank you. Can you show an answer keeping the function "test" as a separate function that is passed either in map or where? This is what will help me in my real life example.

jim jarnac Over a year ago

Ok thx: so I understand i have to use either apply or iterrows here - there is no way to achieve the result using a vectorized operation as I thought would be possible? in the doc of the where method they mention the possibilty of using a callable, which is what I m trying to do here: pandas.pydata.org/pandas-docs/stable/generated/…

jim jarnac Over a year ago

yes I think that should do it. I will do test time it but I suspect where is faster than apply? In which case it is what I am looking for

jezrael Over a year ago

Exactly. np.where is very fast, I think faster as map and sure more as apply

Collectives™ on Stack Overflow

Python Pandas .where with more than 2 possible condition inputs

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related