I wish to merge to two dataframes Df1 and Df2
import pandas as pd
Df1 = pd.DataFrame({
'name' : ['jack', None, None],
'Surname' : ['Peterson', 'Macleans', None],
'city' : ['Sydney', 'Delhi', 'New york']})
,
Df2 = pd.DataFrame({
'name' : ['jack', 'Riti', 'Aadi','Jeff'],
'Surname' : ['Peterson', 'Macleans', 'McDonald','Cooper'],
'city' : ['Sydney', 'Delhi', 'New york','Tokyo'],
'Rating' : ['AAA', 'AA', 'A','BBB']})
I want Pandas to first merge based on the first column, and if the match fails, it merge them based on the second column and if it fails it merge them based on the third column.
I used
new_df = pd.DataFrame([])
new_df = pd.merge(Df1, Df2, how='left', left_on=['name','Surname','city'], right_on = ['name','Surname','city'])
yet this does not generate my desire dataframe
Final_Df = pd.DataFrame({
'name' : ['jack', None, None],
'Surname' : ['Peterson', 'Macleans', None],
'city' : ['Sydney', 'Delhi', 'New york'],
'Rating' : ['AAA', 'AA', 'A']})
Edit 1: Thank you "Quang Hoang" for providing the answer!
Let's try a for loop:
Df1['Rating']=np.nan
for col in Df1.columns[:-1]:
Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))
Output:
name Surname city Rating
0 jack Peterson Sydney AAA
1 None Macleans Delhi AA
2 None None New york A
Edit 2: In Case one has extra column in Df1, which is not in Df2, the correct code will look like this:
import numpy as np
Df1['Rating']=np.nan
for col in ['name', 'Surname','city']:
Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))
Edit 3: In Case of duplicates in Df1 columns, the following code did work.
import numpy as np
Df1['Rating']=np.nan
for col in ['name', 'Surname','city']:
Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.drop_duplicates(col).set_index(col)['Rating']))