3

I have the following data and I want to create a new column with certain conditions. See the following:
DataSets:

real,rel
1,0
0,1
1,1
0,1
0,0
0,0
1,1
1,1
0,0
0,1
1,0
1,1
0,1
1,0

The code I tried and the error I received:

>>> import pandas as pd
>>> df = pd.read_csv("test.csv")
>>> df.loc[df["real"]==0 and df["rel"]==0,"out"] = 9
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\site-packages\pandas\core\generic.py", line 1576, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have the condition for the out column as:
when real is 0 and rel is 0, out should be 0
when real is 1 and rel is 1, out should be 1
when real is 1 and rel is 0, out should be 2
when real is 0 and rel is 1, out should be 3
Please let me know what I can do to fulfill the missing part. I have checked this: Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

5 Answers 5

5

On way is using np.select. You can start by defining the set of conditions:

c1 = (df.real == 0) & (df.rel == 0) 
c2 = (df.real == 1) & (df.rel == 1) 
c3 = (df.real == 1) & (df.rel == 0) 
c4 = (df.real == 0) & (df.rel == 1) 

And then you can select among range(4) according to the result of the conditions:

import numpy as np
df['out'] = np.select([c1,c2,c3,c4], range(4))

     real  rel  out
0      1    0    2
1      0    1    3
2      1    1    1
3      0    1    3
4      0    0    0
5      0    0    0
6      1    1    1
7      1    1    1
8      0    0    0
9      0    1    3
10     1    0    2
11     1    1    1
12     0    1    3
13     1    0    2
Sign up to request clarification or add additional context in comments.

Comments

4
when real is 0 and rel is 0, out should be 0
when real is 1 and rel is 1, out should be 1
when real is 1 and rel is 0, out should be 2
when real is 0 and rel is 1, out should be 3

These cases can be combined into one statement:

df['out'] = df['rel'] + 2*(df['real'] != df['rel'])
print(df)

Output:

    real  rel  out
0      1    0    2
1      0    1    3
2      1    1    1
3      0    1    3
4      0    0    0
5      0    0    0
6      1    1    1
7      1    1    1
8      0    0    0
9      0    1    3
10     1    0    2
11     1    1    1
12     0    1    3
13     1    0    2

2 Comments

Nice out of box solution.
Nice solution indeed
3

Hi Below is the answer for you query:

df.loc[(df["real"]==0) & (df["rel"]==0),"out"] = 0
df.loc[(df["real"]==1) & (df["rel"]==1),"out"] = 1
df.loc[(df["real"]==1) & (df["rel"]==0),"out"] = 2
df.loc[(df["real"]==0) & (df["rel"]==1),"out"] = 3

Comments

1

One possible solution is create helper DataFrame and merge:

df1 = pd.DataFrame({'real': [0, 0, 1, 1], 'rel': [0, 1, 0, 1], 'new': [0, 1, 2, 3]})
print (df1)
   real  rel  new
0     0    0    0
1     0    1    1
2     1    0    2
3     1    1    3

df = df.merge(df1, how='left')
print (df)
    real  rel  new
0      1    0    2
1      0    1    1
2      1    1    3
3      0    1    1
4      0    0    0
5      0    0    0
6      1    1    3
7      1    1    3
8      0    0    0
9      0    1    1
10     1    0    2
11     1    1    3
12     0    1    1
13     1    0    2

Comments

1

You can use numpy.where to conditionally fill columns:

df["new_column"] = np.nan
df["new_column"] = np.where((df["real"]==0) & (df["rel"]==0), 0, df["new_column"])
df["new_column"] = np.where((df["real"]==1) & (df["rel"]==1), 1, df["new_column"])
# ... etc. through the rest of your conditions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.