I have 2 Pandas dfs, A and B. Both have 10 columns and the index 'ID'. Where the IDs of A and B match, I want to replace the rows of B with the rows of A. I have tried to use pd.update, but no success yet. Any help appreciated.
3 Answers
below code should do the trick
s1 = pd.Series([5, 1, 'a'])
s2 = pd.Series([6, 2, 'b'])
s3 = pd.Series([7, 3, 'd'])
s4 = pd.Series([8, 4, 'e'])
s5 = pd.Series([9, 5, 'f'])
df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)], columns = ["A", "B", "C"])
s1 = pd.Series([5, 6, 'p'])
s2 = pd.Series([6, 7, 'q'])
s3 = pd.Series([7, 8, 'r'])
s4 = pd.Series([8, 9, 's'])
s5 = pd.Series([9, 10, 't'])
df2 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)], columns = ["A", "B", "C"])
df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
print df1
output
A B C
0 5 6 p
1 6 7 q
2 7 8 r
3 8 9 s
4 9 10 t
Edit from comments:
To replace the whole row instead of only some columns:
cols = list(df1.columns)
df1.loc[df1.A.isin(df2.A), cols] = df2[cols]
3 Comments
Rafael Almeida
Old question but what if I want to replace the whole row instead of some columns?
Shijo
@ Rafael Almeida I assume that you have same set columns in both dataframes; Try this; cols = list(df1.columns) df1.loc[df1.A.isin(df2.A), cols] = df2[cols]
mahendri
the last line
df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']] produce NaN for me. I need to ad .values in the assigned values. It becomes df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']].valuesYou can empty your target cells in A (by setting them to NaN) and use the combine_first() method to fill those with B's values. Although it may sound counter-intuitive, this approach gives you the flexibility to both target rows and specific columns in 2 lines of code. Hope that helps.
An example replacing the full row's that have an index match:
# set-up
cols = ['c1','c2','c3']
A = pd.DataFrame(np.arange(9).reshape((3,3)), columns=cols)
B = pd.DataFrame(np.arange(10,16).reshape((2,3)), columns=cols)
#solution
A.loc[B.index] = np.nan
A = A.combine_first(B)
An example of only replacing certain target columns for row's that have an index match:
A.loc[B.index, ['c2','c3']] = np.nan
A = A.combine_first(B)
1 Comment
fpersyn
Regarding the
SettingWithCopyWarning resulting from using .loc(). It is my understanding that you can safely ignore it when you intend to overwrite the original DataFrame A. Not entirely sure if that is the case for this question. See this thread for more info.
A.combine_first(B)should work.See how to make a reproducible example.