I have been trying to replace part of the texts in a Pandas dataframe column with keys from a dictionary based on multiple values; though I have achieved the desired result, the process or loop is very very slow in large dataset. I would appreciate it if someone could advise me of a more 'Pythonic' way or more efficient way of achieving the result. Pls see below example:
df = pd.DataFrame({'Dish': ['A', 'B','C'],
'Price': [15,8,20],
'Ingredient': ['apple banana apricot lamb ', 'wheat pork venison', 'orange lamb guinea']
})
| Dish | Price | Ingredient |
|---|---|---|
| A | 15 | apple banana apricot lamb |
| B | 8 | wheat pork venison |
| C | 20 | orange lamb guinea |
The dictionary is below:
CountryList = {'FRUIT': [['apple'], ['orange'], ['banana']],
'CEREAL': [['oat'], ['wheat'], ['corn']],
'MEAT': [['chicken'], ['lamb'], ['pork'], ['turkey'], ['duck']]}
I am trying to replace text in the 'Ingredient' column with key based on dictionary values. For example, 'apple' in the first row wound be replaced by dictionary key: 'FRUIT'.. The desired table is shown below:
| Dish | Price | Ingredient |
|---|---|---|
| A | 15 | FRUIT FRUIT apricot MEAT |
| B | 8 | CEREAL MEAT venison |
| C | 20 | FRUIT MEAT guinea |
I have seen some related queries here where each key has one value; but in this case, there are multiple values for any given key in the dictionary. So far, I have been able to achieve the desired result but it is painfully slow when working with a large dataset. The code I have used so far to achieve the result is shown below:
countries = list(CountryList.keys())
for country in countries:
for i in range(len(CountryList[country])):
lender = CountryList[country][i]
country = str(country)
lender = str(lender).replace("['",'',).replace("']",'')
df['Ingredient'] = df['Ingredient'].str.replace(lender,country)
Perhaps this could do with multiprocessing? Needless to say, my knowledge of Python needs a lot to be desired.
Any suggestion to speed up the process would be highly appreciated.
Thanking in advance,
Edit: just to add, some keys have more than 60000 values in the dictionary; and about 200 keys in the dictionary, which is making the code very inefficient time-wise.
CountryListcan be changed? Do you really need list of 1 element?