0

I have 2 data frames. One pulls values using an API of current values of certain fields on system. The other has the actual current values of those fields. Example: Names on the system and names on paper. I've merged the two on the common column but now am trying to compare the names on Python to see if they are an approximate match and/or if they need an update. Is there a way I can do this? I believe this can be done on excel using isnumber(search(...)).

Case Insensitive and possibly consider abbreviations (I can make a dictionary?) for comparing text strings

Example of how my dataframe looks and the result I want:

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-0lax{text-align:left;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-0lax"></th>
    <th class="tg-0lax">Name on System</th>
    <th class="tg-0lax">Current Name</th>
    <th class="tg-0lax">Match</th>
  </tr>
  <tr>
    <td class="tg-0lax">1</td>
    <td class="tg-0lax">APPLE INFORMATION TECHNOLOGY</td>
    <td class="tg-0lax">Apple International Information Technology </td>
    <td class="tg-0lax">No</td>
  </tr>
  <tr>
    <td class="tg-0lax">2</td>
    <td class="tg-0lax">IBM Intl group</td>
    <td class="tg-0lax">IBM International Group</td>
    <td class="tg-0lax">YES</td>
  </tr>
</table>

PS. Apologies in advance if I have broken any rules or etiquette of the Stack community, I'm new to this and open to learning and constructive criticism.

5
  • 1
    What are your criteria of comparison? They are not exact strings still one is Yes other is No Commented Oct 4, 2018 at 14:13
  • What axis are you comparing? Row or column? Commented Oct 4, 2018 at 14:30
  • @Chris I'm comparing columns, so IBM Intl group vs IBM International Group Commented Oct 4, 2018 at 15:01
  • @mad_ approximate matches, so any matches above a certain predetermined threshold of a ratio would quantify a match? Commented Oct 4, 2018 at 15:03
  • Possible duplicate of Find the similarity metric between two strings Commented Oct 4, 2018 at 15:07

2 Answers 2

1

Maybe a good way would be calculating similarities, and return the highest match probability?

First of all you need to do some data cleaning like removing special characters, convert all strings to lowercase then search using similarity

st1 = 'apple information technology'
st2 = 'apple international information technology'

from difflib import SequenceMatcher
SequenceMatcher(None, st1, st2).ratio() 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! will explore this package
1

Well you can learn more about the string similarity difference here Find the similarity metric between two strings

I am just trying to provide you the application in case you want to try different method using pandas and your metrics.

import pandas as pd
from difflib import SequenceMatcher
df=pd.DataFrame({'Name on System':['APPLE INFORMATION TECHNOLOGY','IBM Intl group'],'Current Name':['Apple International Information Technology','IBM International Group']})

Define your metric in function

def similiarity_ratio(row):
    return SequenceMatcher(None, row['Name on System'].lower(), row['Current Name'].lower()).ratio() 

df['Match']=df.apply(lambda x:similiarity_ratio(x),axis=1)
print(df)

Output

Current Name                                    Name on System                 Match
0   Apple International Information Technology  APPLE INFORMATION TECHNOLOGY  0.800000
1   IBM International Group                     IBM Intl group                0.756757

1 Comment

Thank you! difflib seems to be the package to explore. let's see if it does what I need it do. Thanks again for your help and guidance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.