1

I wish to merge to two dataframes Df1 and Df2

import pandas as pd

    Df1 = pd.DataFrame({
        'name' : ['jack', None, None],
        'Surname' : ['Peterson', 'Macleans', None],
        'city' : ['Sydney', 'Delhi', 'New york']})

,

Df2 = pd.DataFrame({
        'name' : ['jack', 'Riti', 'Aadi','Jeff'],
        'Surname' : ['Peterson', 'Macleans', 'McDonald','Cooper'],
        'city' : ['Sydney', 'Delhi', 'New york','Tokyo'],
        'Rating' : ['AAA', 'AA', 'A','BBB']})

I want Pandas to first merge based on the first column, and if the match fails, it merge them based on the second column and if it fails it merge them based on the third column.

I used

new_df = pd.DataFrame([])
new_df = pd.merge(Df1, Df2,  how='left', left_on=['name','Surname','city'], right_on = ['name','Surname','city'])

yet this does not generate my desire dataframe

Final_Df = pd.DataFrame({
        'name' : ['jack', None, None],
        'Surname' : ['Peterson', 'Macleans', None],
        'city' : ['Sydney', 'Delhi', 'New york'],
        'Rating' : ['AAA', 'AA', 'A']})

Edit 1: Thank you "Quang Hoang" for providing the answer!

Let's try a for loop:

Df1['Rating']=np.nan

for col in Df1.columns[:-1]:
    Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))

Output:

   name   Surname      city Rating
0  jack  Peterson    Sydney    AAA
1  None  Macleans     Delhi     AA
2  None      None  New york      A

Edit 2: In Case one has extra column in Df1, which is not in Df2, the correct code will look like this:

import numpy as np

Df1['Rating']=np.nan

for col in ['name', 'Surname','city']:
    Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))

Edit 3: In Case of duplicates in Df1 columns, the following code did work.

import numpy as np

Df1['Rating']=np.nan

for col in ['name', 'Surname','city']:
    Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.drop_duplicates(col).set_index(col)['Rating']))

1 Answer 1

2

Let's try a for loop:

Df1['Rating']=np.nan

for col in Df1.columns[:-1]:
    Df1['Rating'] = Df1['Rating'].fillna(Df1[col].map(Df2.set_index(col)['Rating']))

Output:

   name   Surname      city Rating
0  jack  Peterson    Sydney    AAA
1  None  Macleans     Delhi     AA
2  None      None  New york      A
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! You saved my day! Just a quick question, if say Df1 has a column that is not available in Df2, e.g. Occupation, then how can you make this code work? Df1 = pd.DataFrame({ 'name' : ['jack', None, None], 'Surname' : ['Peterson', 'Macleans', None], 'city' : ['Sydney', 'Delhi', 'New york'], 'Occupation':['Teacher','Student', 'Professor']})
for col in list_col_to_map:...?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.