I'm trying to create several dataframes using the code below. My problem is the following, I have a list of names (lista_names), one dataframe (df1), and I would like to create one dataframe for each name in my list. In each of these new dataframes, one of the columns would be the Levenshtein distance between one name in my list and all names in the dataframe df1. Thus, in the end I would have n new dataframes, where n is the number of names in my list. Here is my code:
lev = pd.DataFrame({'Levenshtein':0,'n_ordem':0,'nome_ea':'a','nome_censo':'a'}, index = [1])
for i in range(0,len(lista_names)):
for k in range(0,len(df1)):
if isinstance(df1['nome_comp'][k],str):
if Levenshtein.distance(lista_names[i], df1['nome_comp'][k])<=21:
lev = lev.append({'Levenshtein':Levenshtein.distance(lista_names[i], df1['nome_comp'][k]),
'n_ordem': df1['n_ordem'][k], 'nome_ea': lista_names[i],'nome_censo': df1['nome_comp'][k]},
ignore_index = True)
lev.drop(0, axis=0, inplace = True)
lev.to_csv('levenshtein.csv')
Although this solution works, it is too slow and it fails to build the csv file even after 2 days running in my PC. Is there a way to make it faster?
Edit1: n=291