6

I have 2 data frames with identical columns. Column 'key' will have unique values.

Data frame 1:-

A B key C    
0 1 k1  2    
1 2 k2  3    
2 3 k3  5

Data frame 2:-

A B key C    
4 5 k1  2    
1 2 k2  3
2 3 k4  5

I would like to update rows in Dataframe-1 with values in Dataframe -2 if key in Dataframe -2 matches with Dataframe -1. Also if key is new then add entire row from Dataframe-2 to Dataframe-1.

Final Output Dataframe is like this with same columns.

A B key C
4 5 k1  2   --> update
1 2 k2  3   --> no changes
2 3 k3  5   --> no changes
2 3 k4  5   --> new row

I have tried with below code. I need only 4 columns 'A', 'B','Key','C' without any suffixes after merge.

df3 = df1.merge(df2,on='key',how='outer')
>>> df3
   A_x  B_x key  C_x  A_y  B_y  C_y
0  0.0  1.0  k1  2.0  4.0  5.0  2.0
1  1.0  2.0  k2  3.0  1.0  2.0  3.0
2  2.0  3.0  k3  5.0  NaN  NaN  NaN
3  NaN  NaN  k4  NaN  2.0  3.0  5.0
0

6 Answers 6

4

It seems like you're looking for combine_first.

a = df2.set_index('key')
b = df1.set_index('key')

(a.combine_first(b)
  .reset_index()
  .reindex(columns=df1.columns))

     A    B key    C
0  4.0  5.0  k1  2.0
1  1.0  2.0  k2  3.0
2  2.0  3.0  k3  5.0
3  2.0  3.0  k4  5.0
Sign up to request clarification or add additional context in comments.

Comments

2

try this:

df1 = {'key': ['k1', 'k2', 'k3'], 'A':[0,1,2], 'B': [1,2,3], 'C':[2,3,5]}
df1 = pd.DataFrame(data=df1)
print (df1)
df2 = {'key': ['k1', 'k2', 'k4'], 'A':[4,1,2], 'B': [5,2,3], 'C':[2,3,5]}
df2 = pd.DataFrame(data=df2)
print (df2)
df3 = df1.append(df2)
df3.drop_duplicates(subset=['key'], keep='last', inplace=True)
df3 = df3.sort_values(by=['key'], ascending=True)
print (df3)

3 Comments

you should comment your line "df3 = df1.merge(df2,on='key',how='outer')"
It's not updating the values of first row. It's keeping the values from first dataframe. Merge condition is :- inline For each key in dataframe2: if key is present in dataframe1: update the row values else: add the row
I've written the all code now. What version of python are you using?
2

First, you need to indicate index columns:

df1.set_index('key', inplace=True)
df2.set_index('key', inplace=True)

Then, combine the dataframes to get all the index keys in place (this will not update the df1 values! See: combine_first manual):

df1 = df1.combine_first(df2)

Last step is updating the values in df1 with df2 and resetting the index

df1.update(df2)
df1.reset_index(inplace=True)

Comments

0

Try to append and remove duplicates:

df3 = pd.drop_duplicates(df1.append(df2))

1 Comment

It's not removing suffixes. df3 = df3.drop_duplicates(df1.append(df2)) >>> df3 A_x B_x key C_x A_y B_y C_y 0 0.0 1.0 k1 2.0 NaN NaN NaN 1 1.0 2.0 k2 3.0 NaN NaN NaN 2 2.0 3.0 k3 5.0 NaN NaN NaN 3 NaN NaN k4 NaN 2.0 3.0 5.0
0

assumes both dataframes have the same index columns

df3 = df1.combine_first(df2)
df3.update(df2)

Comments

0

After setting the same column as index on each dataframe:

def df_upsert(df1, df2):
    df = df1.combine_first(df2)
    df.update(df2)
    return df

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.