I'm trying to update one dataframe with data from another, for one specific column called 'Data'. Both dataframe's have the unique ID caled column 'ID'. Both columns have a 'Data' column. I want data from 'Data' in df2 to overwrite entries in df1 'Data', for only the amount of rows that are in df1. Where there is no corresponding 'ID' in df2 the df1 entry should remain.
import pandas as pd
data1 = '''\
ID Data Data1
1 AA BB
2 AB BF
3 AC BK
4 AD BL'''
data2 = '''\
ID Data
1 AAB
3 AAL
4 MNL
5 AAP
6 MNX
8 DLP
9 POW'''
df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+')
df2 = pd.read_csv(pd.compat.StringIO(data2), sep='\s+')
Expected output:
new df3 expected outcome.
ID Data Data1
1 AAB BB
2 AB BF
3 AAL BK
4 MNL BL
df2 is a master list of values which never changes and has thousands of entries, where as df1 sometime only ever has a few hundred entries.
I have looked at pd.merge and combine_first however can't seem to get the right combination.
df3 = pd.merge(df1, df2, on='ID', how='left')
Any help much appreciated.