0

I have been working on an Excel file and looking out to find out accuracy percentage by using Fuzzy match / any other method from another sheet using an dataframe. Matching other column values using one column having unique values.

Input values:

Sheet 1

identity_no  address            Pincode   company_name

 IN2231      Delhi, Indi        110030    AXN pvt Ltd
 UK654       London, Uk         897653    Aviva Intl Ltd
 SL1432      Colombo, Srilanka  07658     Ship Incorporations
 LK0678      Libya, Sns         5674332   Oppo Mobiles pvt ltd

Master Data Sheet 2

identity_no  address            Pincode   company_name

 IN2231      Delhi, India       110030    AXN pvt Ltd
 UK654       London, Uk         897653    Aviva Intl Ltd
 SL1432      Colombo, Srilanka  07658     Ship Incorporations

Looking out to map the column values from Master Data Sheet 2 and update the values into Sheet 1 only when matching accuracy percentage is more than 75% . based on identity_no. Also if any identity_no is not present in Sheet 2, We also need to update into sheet 2.

I have tried to use Fuzzy match bot couldn't worked

Script i have been trying to use so far is

df = pd.read_excel(open(r'input.xlsx', 'rb'), sheet_name='sheet1')
df2 = pd.read_excel(open(r'master_data.xlsx', 'rb'), sheet_name='sheet2')

from difflib import SequenceMatcher
lst = df2[['identity_no','address', 'Pincode', 'company_name']]

def closest(s):
    highest = 0
    result = ''
    for i in lst:
        temp = SequenceMatcher(None, s, i).ratio() #Similarity Ratio
        if temp > highest:
            highest = temp
            result = i
    return result

How to check for the matching accuracy percentage and only execute the further process when it is more than 75%

df[['identity_no','address', 'Pincode', 'company_name']] = df[['identity_no','address', 'Pincode', 'company_name']] .apply(lambda x: closest(x))
4
  • Why don't you just merge on 'identity_no', 'Pincode', 'company_name' and then drop whichever address column is less clean? Commented Mar 31, 2021 at 18:54
  • @HS-nebula - tried it but didnt worked for me in this case as i have to map from sheet 2 to sheet 1 only when matching percentage is more than 75%. Commented Mar 31, 2021 at 19:06
  • You can check out this question and this question then. Commented Mar 31, 2021 at 19:25
  • I'd also point out that you could do the merge on the three columns, then create a 5th column that contains the ratio between the address columns, and drop rows that are greater than 0.75 - might be easier than fuzzy merging Commented Mar 31, 2021 at 19:29

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.