I have two dataframes and I need to conditionally updated specific columns in the first dataframe.
df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])
print df1
Key identifier A B C D E F
0 1 Foo 1 1 1 NaN NaN NaN
1 2 Foo 2 2 2 NaN NaN NaN
2 3 Bar 3 3 3 NaN NaN NaN
df2 = pd.DataFrame([[1,np.nan,10,10,10,5,6,7],[2,np.nan,12,12,12,8,9,10],[3,np.nan,13,13,13,11,12,13]], columns = ['Key','identifier','A','B','C','D','E','F'])
print df2
Key identifier A B C D E F
0 1 NaN 10 10 10 5 6 7
1 2 NaN 12 12 12 8 9 10
2 3 NaN 13 13 13 11 12 13
Where the identifer column in df1 =='Foo', I need to update df1 columns D,E,F with the corresponding columns from df2. How can I conditionally update those three columns?
df3 = #code here
desired output:
print df3
Key identifier A B C D E F
0 1 Foo 1 1 1 5.0 6.0 7.0
1 2 Foo 2 2 2 8.0 9.0 10.0
2 3 Bar 3 3 3 NaN NaN NaN
Follow-Up
Say instead, df1 was the following:
df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[4,'Bar',4,4,4,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])
Now the lengths of df1 and df2 aren't the same and the positioning of the records to be updated doesn't match. How is this still working? I get the following output:
df2[df1['identifier'] == 'Foo'].combine_first(df1)
Key identifier A B C D E F
0 1.0 Foo 10.0 10.0 10.0 5.0 6.0 7.0
1 4.0 Bar 4.0 4.0 4.0 NaN NaN NaN
2 3.0 Foo 13.0 13.0 13.0 11.0 12.0 13.0
3 3.0 Bar 3.0 3.0 3.0 NaN NaN NaN