0

I have the following dataframe:

  data = [
(27450, 27450, 29420,"10/10/2016"),
(29420 , 36142, 29420, "10/10/2016"),
(11 , 11, 27450, "10/10/2016")] 

#Create DataFrame base
df = pd.DataFrame(data, columns=("User_id","Actor1","Actor2", "Time"))

The first column contains the user_id, and each line represents one action that he makes. Each user_id shows up either in "Actor1" or "Actor2" column.

First, I would like to create a new column where it will assign the value 1 if the user_id is found in "Actor1" column and 0 otherwise.

Second, I would like to create a new column where for each user_id it will store the "Actor"_i value that he interacted with.

For the above example, the output will look like:

   Col1  Col2    
    1    29420    
    0    36142    
    1    27450    

What is the most efficient pythonic way to do this?

Thanks a lot in advance!

1
  • is it necessary that if user_id is found in actor1 column, it'll be in the same row? Commented Jun 23, 2016 at 18:13

2 Answers 2

2
import numpy as np
import pandas as pd

data = [(27450, 27450, 29420,"10/10/2016"),
        (29420 , 36142, 29420, "10/10/2016"),
        (11 , 11, 27450, "10/10/2016")] 
df = pd.DataFrame(data, columns=("User_id","Actor1","Actor2", "Time"))
mask = (df['User_id'] == df['Actor1'])
df['first actor'] = mask.astype(int)
df['other actor'] = np.where(mask, df['Actor2'], df['Actor1'])
print(df)

yields

   User_id  Actor1  Actor2        Time  first actor  other actor
0    27450   27450   29420  10/10/2016            1        29420
1    29420   36142   29420  10/10/2016            0        36142
2       11      11   27450  10/10/2016            1        27450

First create a boolean mask which is True when User_id equals Actor1:

In [51]: mask = (df['User_id'] == df['Actor1']); mask
Out[51]: 
0     True
1    False
2     True
dtype: bool

Converting mask to ints creates the first column:

In [52]: mask.astype(int)
Out[52]: 
0    1
1    0
2    1
dtype: int64

Then use np.where to select between two values. np.where(mask, A, B) returns an array whose ith value is A[i] if mask[i] is True, and B[i] otherwise. Thus, np.where(mask, df['Actor2'], df['Actor1']) takes the value from Actor2 where mask is True, and the value from Actor1 otherwise:

In [53]: np.where(mask, df['Actor2'], df['Actor1'])
Out[53]: array([29420, 36142, 27450])
Sign up to request clarification or add additional context in comments.

1 Comment

pandas.Series.where would remove the need to import numpy: df['Actor2'].where(mask, df['Actor1']).
0

Heres my solution - I have assumed that if userid appears in actor1 column its not necessary it'll be in the same row...

df["Col1"] = [1 if i in df["Actor1"].values else 0 for i in df["User_id"].values]
df["Col2"] = [df.iloc[i]["Actor2"] if j == 1 else df.iloc[i]["Actor1"] for i, j in enumerate(df["Col1"].values)]

Output -

User_id  Actor1  Actor2        Time  Col1   Col2
0    27450   27450   29420  10/10/2016     1  29420
1    29420   36142   29420  10/10/2016     0  36142
2       11      11   27450  10/10/2016     1  27450

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.