1

I'm trying to create several dataframes using the code below. My problem is the following, I have a list of names (lista_names), one dataframe (df1), and I would like to create one dataframe for each name in my list. In each of these new dataframes, one of the columns would be the Levenshtein distance between one name in my list and all names in the dataframe df1. Thus, in the end I would have n new dataframes, where n is the number of names in my list. Here is my code:

lev = pd.DataFrame({'Levenshtein':0,'n_ordem':0,'nome_ea':'a','nome_censo':'a'}, index = [1])

for i in range(0,len(lista_names)):
    for k in range(0,len(df1)):
        if isinstance(df1['nome_comp'][k],str):
            if Levenshtein.distance(lista_names[i], df1['nome_comp'][k])<=21:
                lev = lev.append({'Levenshtein':Levenshtein.distance(lista_names[i], df1['nome_comp'][k]),
                'n_ordem': df1['n_ordem'][k], 'nome_ea': lista_names[i],'nome_censo': df1['nome_comp'][k]}, 
                                 ignore_index = True)

lev.drop(0, axis=0, inplace = True)

lev.to_csv('levenshtein.csv')

Although this solution works, it is too slow and it fails to build the csv file even after 2 days running in my PC. Is there a way to make it faster?

Edit1: n=291

1 Answer 1

2

The problem is with the line

lev = lev.append({'Levenshtein':Levenshtein.distance(lista_names[i], df1['nome_comp'][k])

within the loop.

Pandas DataFrames are not designed for sequential insertion, and are very inefficient at that.

Instead, create a list of DataFrames levs, and append the DataFrame to it within the loop.

levs.append(pd.DataFrame(lev = lev.append({'Levenshtein':Levenshtein.distance(lista_names[i], df1['nome_comp'][k]),
            'n_ordem': df1['n_ordem'][k], 'nome_ea': lista_names[i],'nome_censo': df1['nome_comp'][k]})

When the loop is done, call pd.concat(levs). YMMV, but from similar cases I've had, it should be 10-200 times faster than your current code.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.