Condition on Two columns and create third column failed Pandas Python

Question

I have the following data and I want to create a new column with certain conditions. See the following:
DataSets:

real,rel
1,0
0,1
1,1
0,1
0,0
0,0
1,1
1,1
0,0
0,1
1,0
1,1
0,1
1,0

The code I tried and the error I received:

>>> import pandas as pd
>>> df = pd.read_csv("test.csv")
>>> df.loc[df["real"]==0 and df["rel"]==0,"out"] = 9
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\site-packages\pandas\core\generic.py", line 1576, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have the condition for the out column as:
when real is 0 and rel is 0, out should be 0
when real is 1 and rel is 1, out should be 1
when real is 1 and rel is 0, out should be 2
when real is 0 and rel is 1, out should be 3
Please let me know what I can do to fulfill the missing part. I have checked this: Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

yatu · Accepted Answer · 2019-03-12 13:51:19Z

5

On way is using np.select. You can start by defining the set of conditions:

c1 = (df.real == 0) & (df.rel == 0) 
c2 = (df.real == 1) & (df.rel == 1) 
c3 = (df.real == 1) & (df.rel == 0) 
c4 = (df.real == 0) & (df.rel == 1)

And then you can select among range(4) according to the result of the conditions:

import numpy as np
df['out'] = np.select([c1,c2,c3,c4], range(4))

     real  rel  out
0      1    0    2
1      0    1    3
2      1    1    1
3      0    1    3
4      0    0    0
5      0    0    0
6      1    1    1
7      1    1    1
8      0    0    0
9      0    1    3
10     1    0    2
11     1    1    1
12     0    1    3
13     1    0    2

answered Mar 12, 2019 at 13:51

yatu

88.7k12 gold badges93 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

perl · Accepted Answer · 2019-03-12 13:55:24Z

4

when real is 0 and rel is 0, out should be 0
when real is 1 and rel is 1, out should be 1
when real is 1 and rel is 0, out should be 2
when real is 0 and rel is 1, out should be 3

These cases can be combined into one statement:

df['out'] = df['rel'] + 2*(df['real'] != df['rel'])
print(df)

Output:

    real  rel  out
0      1    0    2
1      0    1    3
2      1    1    1
3      0    1    3
4      0    0    0
5      0    0    0
6      1    1    1
7      1    1    1
8      0    0    0
9      0    1    3
10     1    0    2
11     1    1    1
12     0    1    3
13     1    0    2

edited Mar 12, 2019 at 13:55

answered Mar 12, 2019 at 13:50

perl

9,9811 gold badge14 silver badges23 bronze badges

2 Comments

jezrael Over a year ago

Nice out of box solution.

yatu Over a year ago

Nice solution indeed

Ghanshyam Savaliya · Accepted Answer · 2019-03-12 13:55:34Z

3

Hi Below is the answer for you query:

df.loc[(df["real"]==0) & (df["rel"]==0),"out"] = 0
df.loc[(df["real"]==1) & (df["rel"]==1),"out"] = 1
df.loc[(df["real"]==1) & (df["rel"]==0),"out"] = 2
df.loc[(df["real"]==0) & (df["rel"]==1),"out"] = 3

answered Mar 12, 2019 at 13:55

Ghanshyam Savaliya

6169 silver badges25 bronze badges

Comments

jezrael · Accepted Answer · 2019-03-12 13:50:19Z

1

One possible solution is create helper DataFrame and merge:

df1 = pd.DataFrame({'real': [0, 0, 1, 1], 'rel': [0, 1, 0, 1], 'new': [0, 1, 2, 3]})
print (df1)
   real  rel  new
0     0    0    0
1     0    1    1
2     1    0    2
3     1    1    3

df = df.merge(df1, how='left')
print (df)
    real  rel  new
0      1    0    2
1      0    1    1
2      1    1    3
3      0    1    1
4      0    0    0
5      0    0    0
6      1    1    3
7      1    1    3
8      0    0    0
9      0    1    1
10     1    0    2
11     1    1    3
12     0    1    1
13     1    0    2

answered Mar 12, 2019 at 13:50

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Comments

Toby Petty · Accepted Answer · 2019-03-12 13:50:52Z

1

You can use numpy.where to conditionally fill columns:

df["new_column"] = np.nan
df["new_column"] = np.where((df["real"]==0) & (df["rel"]==0), 0, df["new_column"])
df["new_column"] = np.where((df["real"]==1) & (df["rel"]==1), 1, df["new_column"])
# ... etc. through the rest of your conditions.

answered Mar 12, 2019 at 13:50

Toby Petty

4,6901 gold badge19 silver badges32 bronze badges

Collectives™ on Stack Overflow

Condition on Two columns and create third column failed Pandas Python

5 Answers 5

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related