11

I have 2 Pandas dfs, A and B. Both have 10 columns and the index 'ID'. Where the IDs of A and B match, I want to replace the rows of B with the rows of A. I have tried to use pd.update, but no success yet. Any help appreciated.

3
  • 3
    try A.combine_first(B) should work.See how to make a reproducible example. Commented Sep 1, 2016 at 9:13
  • If you update rows of B with rows of A the same result will come? not clear can you provide an example of expected output Commented Sep 1, 2016 at 9:21
  • Marlon: No, A and B contain some matching IDs, but the contents of the other columns is different. Commented Sep 1, 2016 at 9:44

3 Answers 3

14

below code should do the trick

s1 = pd.Series([5, 1, 'a'])
s2 = pd.Series([6, 2, 'b'])
s3 = pd.Series([7, 3, 'd'])
s4 = pd.Series([8, 4, 'e'])
s5 = pd.Series([9, 5, 'f'])



df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

s1 = pd.Series([5, 6, 'p'])
s2 = pd.Series([6, 7, 'q'])
s3 = pd.Series([7, 8, 'r'])
s4 = pd.Series([8, 9, 's'])
s5 = pd.Series([9, 10, 't'])

df2 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["A", "B", "C"])

df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']]
print df1

output

   A   B  C
0  5   6  p
1  6   7  q
2  7   8  r
3  8   9  s
4  9  10  t

Edit from comments:

To replace the whole row instead of only some columns:

cols = list(df1.columns) 
df1.loc[df1.A.isin(df2.A), cols] = df2[cols]
Sign up to request clarification or add additional context in comments.

3 Comments

Old question but what if I want to replace the whole row instead of some columns?
@ Rafael Almeida I assume that you have same set columns in both dataframes; Try this; cols = list(df1.columns) df1.loc[df1.A.isin(df2.A), cols] = df2[cols]
the last line df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']] produce NaN for me. I need to ad .values in the assigned values. It becomes df1.loc[df1.A.isin(df2.A), ['B', 'C']] = df2[['B', 'C']].values
2

You can empty your target cells in A (by setting them to NaN) and use the combine_first() method to fill those with B's values. Although it may sound counter-intuitive, this approach gives you the flexibility to both target rows and specific columns in 2 lines of code. Hope that helps.

An example replacing the full row's that have an index match:

# set-up
cols = ['c1','c2','c3']
A = pd.DataFrame(np.arange(9).reshape((3,3)), columns=cols)
B = pd.DataFrame(np.arange(10,16).reshape((2,3)), columns=cols)

#solution
A.loc[B.index] = np.nan
A = A.combine_first(B)

An example of only replacing certain target columns for row's that have an index match:

A.loc[B.index, ['c2','c3']] = np.nan
A = A.combine_first(B)

1 Comment

Regarding the SettingWithCopyWarning resulting from using .loc(). It is my understanding that you can safely ignore it when you intend to overwrite the original DataFrame A. Not entirely sure if that is the case for this question. See this thread for more info.
0

enter image description here Above code works based on the index value. If we have different row counts for both data frame. It will not work. For that, we need to set a specific column to index

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.