String Matching with dictionary key in python

Question

I have one list of string and one dictionary. For eg:

list = ["apple fell on Newton", "lemon is yellow","grass is greener"]
dict = {"apple" : "fruits", "lemon" : "vegetable"}

Task is to match each string from list with the key of dictionary. If it matches then return the value of the key.

Currently, I am using this approach which is very time consuming. Can someone please help me out with any efficient technique ?

lmb_extract_type = (lambda post: list(filter(None, set(dict.get(w)[0] if w in post.lower().split() else None for w in dict))))

 df['type']  = df[list].apply(lmb_extract_type)

I am not sure what the df here is, but for the 2 inputs that you have provided, do check my answer. On a side note, try not using list and dict as variable names specially when you are also using list() or dict() for data type conversion :) — Akshay Sehgal
– Akshay Sehgal, Commented Feb 5, 2021 at 7:48
Number of elements in the list is around 40-50 million.So, its taking a lot of time — SK Singh
– SK Singh, Commented Feb 5, 2021 at 7:49
It is a single column with a string (eg.: "apple fell on Newton") in each row of the data frame. For each row, I have to match it with key from the dictionary and return value of the corresponding key — SK Singh
– SK Singh, Commented Feb 5, 2021 at 7:53

Akshay Sehgal · Accepted Answer · 2021-02-05 08:17:56Z

It is a single column with a string (eg.: "apple fell on Newton") in each row of the data frame. For each row, I have to match it with key from the dictionary and return value of the corresponding key

Number of elements in the list is around 40-50 million.So, its taking a lot of time

IIUC, based on your comments, you can solve this easily with a str.extract and series.replace, both of which are vectorized functions without any loops.

For using str.extract, you can create a regex pattern from the keys of the dictionary. This only extracts the keywords apple or lemon.
You can use the dictionary d to then simply replace each of these directly with the corresponding values

l = ["apple fell on Newton", "lemon is yellow","grass is greener"]
d = {"apple" : "fruits", "lemon" : "vegetable"}

df = pd.DataFrame(l, columns=['sentences']) #Single column dataframe to demonstrate.

pattern = '('+'|'.join(d.keys())+')'   #Regular expression pattern
df['type'] = df.sentences.str.extract(pattern).replace(d)
print(df)

              sentences       type
0  apple fell on Newton     fruits
1       lemon is yellow  vegetable
2      grass is greener        NaN

Michael S. · Accepted Answer · 2022-08-31 03:16:30Z

0

Check by applying the lambda function and store the values in string in the dataframe.

df['New_Col'] = df['sentences'].apply(lambda l: ', '.join([key for key, value in d.items() if value in l]))

edited Aug 31, 2022 at 3:16

Michael S.

3,1485 gold badges21 silver badges37 bronze badges

answered Aug 25, 2022 at 6:58

Tejas Sutar

891 gold badge1 silver badge5 bronze badges

Collectives™ on Stack Overflow

String Matching with dictionary key in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related