I have two dataframes: df1, df2 which contain each a column with names. I compare every name in df1 with every name in df2. This has to be an approximate match. Iam using fuzzywuzzy token_sort_ratio to get a comparison score.
However this method is very slow and df2 keeps growing, it already takes more then half an hour (4k x 2k rows). Is there a way to speed up the process?
My current implementation:
def match(df2,name):
df2['score'] = df2['name'].map(lambda x: fuzz.token_sort_ratio(x, name))
return df2.loc[(df2['score'].idxmax())
df1['result']=df1['name'].map(lambda x: match(df2,x))