How add multiple columns with a condition Using np.where()

Question

I know how to use np.where() to add one column by 1 condition:

import pandas as pd
import numpy as np
df=pd.read_csv(file,nrows=5)
df['new_col1']= np.where(df['col1'] < '100', 1,2)
df.head()

output:

   col1  col2  new_col1
0     1     3    1
1     2     4    1

what if I want to add 2 columns by the same condition:

df['new_col1'],df['new_col2']= np.where(df['col1'] < '100', (1,2),(3,4))

I want to add new_col1 and new_col2,the result are (1,2),(3,4)

When I tried this code, I received:

ValueError: too many values to unpack (expected 2)

The output should be:

   col1  col2  new_col1 new_col2
0     1     3    1       3
1     2     4    1       3

np.where returns one value. Could you elaborate as to how you want to generate two values to add instead? — ifly6
– ifly6, Commented Jun 16, 2021 at 21:14
Thank you for your reply ,what if I want to add 2 columns by 1 condition ,what else I need to use? — William
– William, Commented Jun 16, 2021 at 21:14
I don't understand what you mean by 'add 2 columns by 1 condition'. Could you give an example of this? — ifly6
– ifly6, Commented Jun 16, 2021 at 21:15
df['column1'],df['column2']= np.where(df['contract'] > '0L000099', 1,2) — William
– William, Commented Jun 16, 2021 at 21:16
Just use df['column2'] = df['column1'] after defining column1 by the np.where above ? — SeaBean
– SeaBean, Commented Jun 16, 2021 at 21:21

Andreas · Accepted Answer · 2021-06-16 21:41:22Z

1

You can use the condition multiple times:

mask = df['contract'] > '0L000099'
df['column1'] = np.where(mask, 1, 2)
df['column2'] = np.where(mask, 3, 4)

or even invert the condition:

df['column2'] = np.where(~mask, 1, 2)

Since your question was updated, here the updated answer, however I am not sure thats actually usefull:

import pandas as pd
df = pd.DataFrame({'test':range(0,10)})
mask  = df['test'] > 3
m_len = len(mask)

df['column1'], df['column2'] = np.where([mask, mask], [[1]*m_len, [3]*m_len], [[2]*m_len, [4]*m_len])

   test  column1  column2
0     0        2        4
1     1        2        4
2     2        2        4
3     3        2        4
4     4        1        3
5     5        1        3
6     6        1        3
7     7        1        3
8     8        1        3
9     9        1        3

edited Jun 16, 2021 at 21:41

answered Jun 16, 2021 at 21:21

Andreas

9,2854 gold badges20 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

William Over a year ago

Thank you for your answer ,can you please give me some explain about what dose this line mean :[1]*m_len, [3]*m_len], [[2]*m_len, [4]*m_len]

Andreas Over a year ago

@William, ofc, it is a numpy specific thing called broadcasting if you want to look it up. Essentially numpy is very efficient because of vectorization, because of this numpy expects certain formats to use that. In this case it needs for example one 1 for each boolean in the condition. Therefore we have to repeat the value, which we can do by multiplying with the length of the condition series. Hope that makes sense.

William Over a year ago

Hi friend can you help me with this question?stackoverflow.com/questions/68476193/…

Andreas Over a year ago

@William, hey William, see my answer under your question. Let me know how it went. Happy coding!

Collectives™ on Stack Overflow

How add multiple columns with a condition Using np.where()

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related