pandas conditionally updated from another dataframe

Question

I have two dataframes and I need to conditionally updated specific columns in the first dataframe.

df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])

print df1

   Key identifier  A  B  C   D   E   F
0    1        Foo  1  1  1 NaN NaN NaN
1    2        Foo  2  2  2 NaN NaN NaN
2    3        Bar  3  3  3 NaN NaN NaN

df2 = pd.DataFrame([[1,np.nan,10,10,10,5,6,7],[2,np.nan,12,12,12,8,9,10],[3,np.nan,13,13,13,11,12,13]], columns = ['Key','identifier','A','B','C','D','E','F'])

print df2

   Key  identifier   A   B   C   D   E   F
0    1         NaN  10  10  10   5   6   7
1    2         NaN  12  12  12   8   9  10
2    3         NaN  13  13  13  11  12  13

Where the identifer column in df1 =='Foo', I need to update df1 columns D,E,F with the corresponding columns from df2. How can I conditionally update those three columns?

df3 = #code here

desired output:

print df3

   Key identifier  A  B  C    D    E     F
0    1        Foo  1  1  1  5.0  6.0   7.0
1    2        Foo  2  2  2  8.0  9.0  10.0
2    3        Bar  3  3  3  NaN  NaN   NaN

Follow-Up

Say instead, df1 was the following:

df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[4,'Bar',4,4,4,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])

Now the lengths of df1 and df2 aren't the same and the positioning of the records to be updated doesn't match. How is this still working? I get the following output:

df2[df1['identifier'] == 'Foo'].combine_first(df1)

Key identifier     A     B     C     D     E     F
0  1.0        Foo  10.0  10.0  10.0   5.0   6.0   7.0
1  4.0        Bar   4.0   4.0   4.0   NaN   NaN   NaN
2  3.0        Foo  13.0  13.0  13.0  11.0  12.0  13.0
3  3.0        Bar   3.0   3.0   3.0   NaN   NaN   NaN

cs95 · Accepted Answer · 2017-11-02 14:10:28Z

2

Use combine_first, after setting Key to the index with set_index.

df1

    identifier  A  B  C   D   E   F
Key                                
1          Foo  1  1  1 NaN NaN NaN
2          Foo  2  2  2 NaN NaN NaN
3          Bar  3  3  3 NaN NaN NaN

df2

     identifier   A   B   C   D   E   F
Key                                    
1           NaN  10  10  10   5   6   7
2           NaN  12  12  12   8   9  10
3           NaN  13  13  13  11  12  13

df2[df1.eval('identifier == "Foo"')].combine_first(df1)

    identifier     A     B     C    D    E     F
Key                                             
1          Foo  10.0  10.0  10.0  5.0  6.0   7.0
2          Foo  12.0  12.0  12.0  8.0  9.0  10.0
3          Bar   3.0   3.0   3.0  NaN  NaN   NaN

edited Nov 2, 2017 at 14:10

answered Nov 2, 2017 at 13:52

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

flyingmeatball Over a year ago

Thanks, this is equivalent to the following, correct? df2[df1['identifier'] == 'Foo'].combine_first(df1)

cs95 Over a year ago

@flyingmeatball That it is. I just wanted to be cute.

flyingmeatball Over a year ago

Thanks - I added a follow up, can you explain why this still works if df1 has 4 items, df2 has 3 items, and they aren't in the right order?

cs95 Over a year ago

@flyingmeatball That's because it combines based on the index - the column index in this case.

BENY Over a year ago

You are cute:-) haha

|

Collectives™ on Stack Overflow

pandas conditionally updated from another dataframe

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related