0

I have two dataframes and I need to conditionally updated specific columns in the first dataframe.

df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])

print df1

   Key identifier  A  B  C   D   E   F
0    1        Foo  1  1  1 NaN NaN NaN
1    2        Foo  2  2  2 NaN NaN NaN
2    3        Bar  3  3  3 NaN NaN NaN

df2 = pd.DataFrame([[1,np.nan,10,10,10,5,6,7],[2,np.nan,12,12,12,8,9,10],[3,np.nan,13,13,13,11,12,13]], columns = ['Key','identifier','A','B','C','D','E','F'])

print df2

   Key  identifier   A   B   C   D   E   F
0    1         NaN  10  10  10   5   6   7
1    2         NaN  12  12  12   8   9  10
2    3         NaN  13  13  13  11  12  13

Where the identifer column in df1 =='Foo', I need to update df1 columns D,E,F with the corresponding columns from df2. How can I conditionally update those three columns?

df3 = #code here

desired output:

print df3

   Key identifier  A  B  C    D    E     F
0    1        Foo  1  1  1  5.0  6.0   7.0
1    2        Foo  2  2  2  8.0  9.0  10.0
2    3        Bar  3  3  3  NaN  NaN   NaN

Follow-Up

Say instead, df1 was the following:

df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[4,'Bar',4,4,4,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])

Now the lengths of df1 and df2 aren't the same and the positioning of the records to be updated doesn't match. How is this still working? I get the following output:

df2[df1['identifier'] == 'Foo'].combine_first(df1)

Key identifier     A     B     C     D     E     F
0  1.0        Foo  10.0  10.0  10.0   5.0   6.0   7.0
1  4.0        Bar   4.0   4.0   4.0   NaN   NaN   NaN
2  3.0        Foo  13.0  13.0  13.0  11.0  12.0  13.0
3  3.0        Bar   3.0   3.0   3.0   NaN   NaN   NaN
0

1 Answer 1

2

Use combine_first, after setting Key to the index with set_index.

df1

    identifier  A  B  C   D   E   F
Key                                
1          Foo  1  1  1 NaN NaN NaN
2          Foo  2  2  2 NaN NaN NaN
3          Bar  3  3  3 NaN NaN NaN

df2

     identifier   A   B   C   D   E   F
Key                                    
1           NaN  10  10  10   5   6   7
2           NaN  12  12  12   8   9  10
3           NaN  13  13  13  11  12  13

df2[df1.eval('identifier == "Foo"')].combine_first(df1)

    identifier     A     B     C    D    E     F
Key                                             
1          Foo  10.0  10.0  10.0  5.0  6.0   7.0
2          Foo  12.0  12.0  12.0  8.0  9.0  10.0
3          Bar   3.0   3.0   3.0  NaN  NaN   NaN
Sign up to request clarification or add additional context in comments.

9 Comments

Thanks, this is equivalent to the following, correct? df2[df1['identifier'] == 'Foo'].combine_first(df1)
@flyingmeatball That it is. I just wanted to be cute.
Thanks - I added a follow up, can you explain why this still works if df1 has 4 items, df2 has 3 items, and they aren't in the right order?
@flyingmeatball That's because it combines based on the index - the column index in this case.
You are cute:-) haha
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.