I have 2 dataframes df1 containing id_number and df2 containing identity_No.
Looking to map highest matching row values from Dataframe2 to Dataframe1 using conditions. Want to compare each row(df1['id_number']) from df1 with the whole column(df2['identity_No']) of df2. I have tried using partial match as well but not getting the output.
df1
score id_number company_name company_code match_acc action_reqd
20 IN2231D AXN pvt Ltd IN225 Yes
45 UK654IN Aviva Intl Ltd IN115 No
65 SL1432H Ship Incorporations CZ555 Yes
35 LK0678G Oppo Mobiles pvt ltd PQ795 Yes
59 NG5678J Nokia Inc RS885 No
20 IN2231D AXN pvt Ltd IN215 Yes
df2
OR_score identity_No comp_name comp_code
51 UK654IN Aviva Int.L Ltd IN515
25 SL6752J Ship Inc Traders CZ555
79 NG5678K Nokia Inc RS005
20 IN22312 AXN pvt Ltd IN255
38 LK0665G Oppo Mobiles ltd PQ895
For Eg: The df1.id_number need to be compared with df2.identity_No, Looking to match based on row1 of df1['id_number'] will match across all rows of df2['identity_No'], and has highest match percentage wrt. row4 of df2['identity_No'] , and its more than 80%, it will copy the respective values from row4 of df2 to row1 of df1. same to be applied for each row of df1.
Expected Output:
score id_number company_name company_code match_acc action_reqd
20 IN22312 AXN pvt Ltd IN225 90 Yes
51 UK654IN Aviva Int.L Ltd IN115 100 No
25 SL1432H Ship Incorporations CZ555 30 Yes
38 LK0665G Oppo Mobiles ltd PQ795 80 Yes
79 NG5678K Nokia Inc RS885 85 No
20 IN22312 AXN pvt Ltd IN225 90 Yes
I have tried this now:
for index, row in df1.iterrows():
for index2, config2 in df2.iterrows():
if process.extractOne(row["id_number"], df["identity_No"])[1] >=80:
df1['id_number'][index] = config2['identity_No']
df1['company_name'][index] = config2['comp_name']
df1['company_code'][index] = config2['comp_code']
df1['score'][index] = config2['OR_Score']
Attempt2
for index, row in df1.iterrows():
for index2, config2 in df2.iterrows():
if fuzz.partial_ratio(row["id_number"], config2["identity_No"]) >=80:
Please Suggest