Numpy np.where condition with multiple columns [duplicate]

Question

I have a dataframe

import pandas as pd
import numpy as np

data = pd.DataFrame({"col1": [0, 1, 1, 1,1, 0],
                     "col2": [False, True, False, False, True, False]
                     })

data

I'm trying to create a column col3 where col1=1 and col2==True its 1 else 0

Using np.where:

data.assign(col3=np.where(data["col1"]==1 & data["col2"], 1, 0))

col1    col2    col3
0   0   False   1
1   1   True    1
2   1   False   0
3   1   False   0
4   1   True    1
5   0   False   1

For row 1: col1==0 & col2=False, but I'm getting col3 as 1.

What am I missing??

The desired output:


col1    col2    col3
0   0   False   0
1   1   True    1
2   1   False   0
3   1   False   0
4   1   True    1
5   0   False   0

mozway · Accepted Answer · 2023-03-17 12:33:25Z

1

You are missing parentheses (& has higher precedence than ==):

data.assign(col3=np.where((data["col1"]==1) & data["col2"], 1, 0))

A way to avoid this is to use eq:

data.assign(col3=np.where(data["col1"].eq(1) & data["col2"], 1, 0))

You can also replace the numpy.where by astype:

data.assign(col3=((data["col1"]==1) & data["col2"]).astype(int))

Output:

   col1   col2  col3
0     0  False     0
1     1   True     1
2     1  False     0
3     1  False     0
4     1   True     1
5     0  False     0

answered Mar 17, 2023 at 12:33

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Numpy np.where condition with multiple columns [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related