2

I am trying to use the .where pandas dataframe method, only I have more than 2 possibilities (ie i have if, elif, else instead of the default behavior if else)

Please consider the following dataframe:

a1 = np.random.rand(7,2)
a2 = np.random.randint(0,3,(7,1))
grid = np.append(a1, a2, axis=1)
df = pd.DataFrame(grid)

I tried

def test(x):
    if x[2] == 0:
        return 5
    if x[2]==1:
        return 10
    if x[2] ==2:
        return 50

df.where(test)

But I receive error message "truth value of a serie is ambiguous". I suspect this is the right direction but I am confused on how to achieve it. The documentation says that if the condition is a callable, the input is considered to be the full df. However even then it seems that it consider x[2] as the entire column 2. Is there no way to achieve a vectorized operation for that task? Is it only possible to iterate row by row, whether with iterrows or apply?

This is a toy example to be clear on the forum, I am not trying to do a simple .map in my real life problem. Please keep the "test" function as a separate function that needs to be passed if you answer, as this is where my difficulty lies.

1 Answer 1

2
np.random.seed(100)
a1 = np.random.rand(7,2)
a2 = np.random.randint(0,3,(7,1))
grid = np.append(a1, a2, axis=1)
df = pd.DataFrame(grid)
print (df)
          0         1    2
0  0.543405  0.278369  2.0
1  0.424518  0.844776  2.0
2  0.004719  0.121569  0.0
3  0.670749  0.825853  0.0
4  0.136707  0.575093  1.0
5  0.891322  0.209202  1.0
6  0.185328  0.108377  1.0

Solution with map:

d = {0:5,1:10,2:50}
df['d'] = df[2].map(d)
print (df)
          0         1    2   d
0  0.543405  0.278369  2.0  50
1  0.424518  0.844776  2.0  50
2  0.004719  0.121569  0.0   5
3  0.670749  0.825853  0.0   5
4  0.136707  0.575093  1.0  10
5  0.891322  0.209202  1.0  10
6  0.185328  0.108377  1.0  10

Another solution with numpy.where:

df['d'] = np.where(df[2] == 0, 5, 
          np.where(df[2]== 1, 10,  50))

print (df)
          0         1    2   d
0  0.543405  0.278369  2.0  50
1  0.424518  0.844776  2.0  50
2  0.004719  0.121569  0.0   5
3  0.670749  0.825853  0.0   5
4  0.136707  0.575093  1.0  10
5  0.891322  0.209202  1.0  10
6  0.185328  0.108377  1.0  10

EDIT:

For separate function is possible use apply with parameter axis=1 for processing df by rows:

def test(x):
    #print (x)
    if x[2] == 0:
        return 5
    if x[2]==1:
        return 10
    if x[2] ==2:
        return 50

df['d'] = df.apply(test, axis=1)
print (df)
          0         1    2   d
0  0.543405  0.278369  2.0  50
1  0.424518  0.844776  2.0  50
2  0.004719  0.121569  0.0   5
3  0.670749  0.825853  0.0   5
4  0.136707  0.575093  1.0  10
5  0.891322  0.209202  1.0  10
6  0.185328  0.108377  1.0  10

But if need function:

def test(x):
    return np.where(x == 0, 5, np.where(x== 1, 10,  50))

print (test(df[2]))
[50 50  5  5 10 10 10]
Sign up to request clarification or add additional context in comments.

4 Comments

HI, thank you. Can you show an answer keeping the function "test" as a separate function that is passed either in map or where? This is what will help me in my real life example.
Ok thx: so I understand i have to use either apply or iterrows here - there is no way to achieve the result using a vectorized operation as I thought would be possible? in the doc of the where method they mention the possibilty of using a callable, which is what I m trying to do here: pandas.pydata.org/pandas-docs/stable/generated/…
yes I think that should do it. I will do test time it but I suspect where is faster than apply? In which case it is what I am looking for
Exactly. np.where is very fast, I think faster as map and sure more as apply

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.