Given two lists , I'm calculating a distance between words in a nested for loop:
from fuzzywuzzy import fuzz
l = ['mango','apple']
l2 = ['ola','john']
for i in l:
for j in l2:
print(i,j,fuzz.ratio(i,j))
mango ola 25
mango john 22
apple ola 25
apple john 0
I would like to find the maximum value for every element of the outer loop. Result would be:
mango ola 25
apple ola 25
Since the other elements have a lower value.
One strategy that I could think of is to use pandas, but I was thinking rather of a pure python implementation. Pandas way for reference:
from fuzzywuzzy import fuzz
import pandas as pd
l = ['mango','apple']
l2 = ['ola','johnkoo']
result = []
for i in l:
for j in l2:
result.append((i,j,fuzz.ratio(i,j)))
df = pd.DataFrame(result,columns = ['word1','word2','distance'])
idx = df.groupby(['word1'])['distance'].transform(max) == df['distance']
print(df[idx])