2

I'm trying to update one dataframe with data from another, for one specific column called 'Data'. Both dataframe's have the unique ID caled column 'ID'. Both columns have a 'Data' column. I want data from 'Data' in df2 to overwrite entries in df1 'Data', for only the amount of rows that are in df1. Where there is no corresponding 'ID' in df2 the df1 entry should remain.

import pandas as pd

data1 = '''\
ID Data Data1
1  AA   BB
2  AB   BF
3  AC   BK
4  AD   BL'''

data2 = '''\
ID Data
1  AAB
3  AAL
4  MNL
5  AAP
6  MNX
8  DLP
9  POW'''

df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+')
df2 = pd.read_csv(pd.compat.StringIO(data2), sep='\s+')

Expected output:

new df3 expected outcome.

ID Data Data1
1  AAB  BB
2  AB   BF
3  AAL  BK
4  MNL  BL

df2 is a master list of values which never changes and has thousands of entries, where as df1 sometime only ever has a few hundred entries.

I have looked at pd.merge and combine_first however can't seem to get the right combination.

df3 = pd.merge(df1, df2, on='ID', how='left')

Any help much appreciated.

1 Answer 1

2

Create new dataframe

Here is one way making use of update:

df3 = df1[:].set_index('ID')
df3['Data'].update(df2.set_index('ID')['Data'])
df3.reset_index(inplace=True)

Or we could use maps/dicts and reassign (Python >= 3.5)

m = {**df1.set_index('ID')['Data'], **df2.set_index('ID')['Data']}
df3 = df1[:].assign(Data=df1['ID'].map(m))

Python < 3.5:

m = df1.set_index('ID')['Data']
m.update(df2.set_index('ID')['Data'])

df3 = df1[:].assign(Data=df1['ID'].map(m))

Update df1

Are you open to update the df1? In that case:

df1.update(df2)

Or if ID not index:

m = df2.set_index('ID')['Data']
df1.loc[df1['ID'].isin(df2['ID']),'Data'] =df1['ID'].map(m)

Or:

df1.set_index('ID',inplace=True)
df1.update(df2.set_index('ID'))
df1.reset_index(inplace=True)

Note: There might be something that makes more sense :)


Full example:

import pandas as pd

data1 = '''\
ID Data Data1
1  AA   BB
2  AB   BF
3  AC   BK
4  AD   BL'''

data2 = '''\
ID Data
1  AAB
3  AAL
4  MNL
5  AAP
6  MNX
8  DLP
9  POW'''

df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+')
df2 = pd.read_csv(pd.compat.StringIO(data2), sep='\s+')

m = {**df1.set_index('ID')['Data'], **df2.set_index('ID')['Data']}
df3 = df1[:].assign(Data=df1['ID'].map(m))

print(df3)

Returns:

   ID Data Data1
0   1  AAB    BB
1   2   AB    BF
2   3  AAL    BK
3   4  MNL    BL
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.