1

I have two dataframes:

df:

 index   some_variable identifier1  identifier2 
  1        x             AB2          AB3
  2        x             BB2          BB3
  3        x             CB2          CB3
  4        y             DB2          DB3
  5        y             EB2          EB3

dfa:

 index   some_variable identifier1  identifier2 identifier3
  1        x             AB5          AB3          AB3
  2        x             BB5          BB2          AB2
  3        x             CB5          CB2          AB5
  4        y             DB5          DB3          AB3
  5        y             EB5          EB3          AB3

if an element of df['identifier1'] is in dfa['identifier2'] replace for that index df['identifier2'] by dfa['identifier3'], if some_variable is equal to 'x'. So the condition is:

[(df['identifier1'].isin(dfa['identifier2'])&(df[some_variable]=='x')] 

and I want:

 index   some_variable identifier1  identifier2 
  1        x             AB2          AB3
  2        x             BB2          AB2
  3        x             CB2          AB5
  4        y             DB2          DB3
  5        y             EB2          EB3

I can set up the condition, but can't figure out how to get the output.

5
  • 1
    Your description doesn't make a lot of sense. Please clarify. Commented Feb 15, 2018 at 16:05
  • 1
    If I follow your logic correctly, df will have no changes, since none of the values in df['identifier1'] are in dfa. Commented Feb 15, 2018 at 16:10
  • I think the logic is wrong and the user has dfa and df swapped around in the isin. Just a guess though. Commented Feb 15, 2018 at 16:12
  • sorry, seems like i switched around the names of the two dataframes Commented Feb 15, 2018 at 16:13
  • The solution I've suggested is basically there (except you've swapped the df names now) but you need to clarify which df 'some_variable' needs to be x in. Both? Commented Feb 15, 2018 at 16:17

2 Answers 2

1

I think this is what you are trying to do:

df1

#    index some_variable identifier1 identifier2
# 0      1             x         AB2         AB3
# 1      2             x         BB2         BB3
# 2      3             x         CB2         CB3
# 3      4             y         DB2         DB3
# 4      5             y         EB2         EB3

df2

#    index some_variable identifier1 identifier2 identifier3
# 0      1             x         AB5         AB3         AB3
# 1      2             x         BB5         BB2         AB2
# 2      3             x         CB5         CB2         AB5
# 3      4             y         DB5         DB3         AB3
# 4      5             y         EB5         EB3         AB3

idx = df1['identifier1'].isin(df2['identifier2']) & (df1['some_variable'] == 'x')
df1.loc[idx, 'identifier2'] = df2['identifier3']

df1

#    index some_variable identifier1 identifier2
# 0      1             x         AB2         AB3
# 1      2             x         BB2         AB2
# 2      3             x         CB2         AB5
# 3      4             y         DB2         DB3
# 4      5             y         EB2         EB3
Sign up to request clarification or add additional context in comments.

3 Comments

Returns a list of NaN values. If I replace = df2['identifier3'] by a fixed value it works. (so: = 2 works). It seems to not match directly to the correct row of df2.
@Peter, not sure what you mean. My output is identical to the output you require.
You're right, it works on the subsample as provided. Must be something wrong when I try to scale it up. I'll figure it out.
0

Something like the following (although there might be an more simple way to do it)

d1 = {'some_variable':['x','x','x','y','y'], 'identifier1':['AB2','BB2','CB2','DB2','EB2'], 'identifier2':['AB3','BB3','CB3','DB3','EB3']}
df = pd.DataFrame(d1)

d2 = {'some_variable':['x','x','x','y','y'], 'identifier1':['AB5','BB5','CB5','DB5','EB5'], 'identifier2':['AB3','BB2','CB2','DB3','EB3'], 'identifier3':['AB3','AB2','AB5','AB3','AB3']}
dfa = pd.DataFrame(d2)

df['identifier2'][(df['identifier1'].isin(dfa['identifier2']) & (df['some_variable'] == 'x'))] = dfa['identifier3'][
    (df['identifier1'].isin(dfa['identifier2']) & (df['some_variable'] == 'x'))]

3 Comments

Returns: ValueError: Can only compare identically-labeled Series objects
Not sure what to suggest; it works for me. I've added the dataframe I'm using, maybe you can spot the issue?
Fair enough, must be something else going on when I scale this up to full data. I'll look into it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.