Get most similar value from dataframe column to specific string python

Question

I want to find the most similar value from a dataframe column to a specified string , e.g. a='book'. Let's say the dataframe looks like: df

col1
wijk 00 book
Wijk a 
test

Now I want to return wijk 00 book since this is the most similar to a. I am trying to do this with the fuzzywuzzy package.

Therefore, I have a dataframe A with the values I want to have a similar one for. Then I use:

A['similar_value'] = A.col1.apply(lambda x: [process.extract(x, df.col1, limit=1)][0][0][0])

But when comparing a lot of strings, this takes too much time. Does anyone knows how to do this quickly?

@ZalakBhalani the strings in the dataframe column should contain the string a — baqm
– baqm, Commented Apr 26, 2021 at 16:09
what's your current code with fuzzywuzzy? we can try to optimize that — tdy
– tdy, Commented Apr 26, 2021 at 16:10

Zalak Bhalani · Accepted Answer · 2021-04-26 16:20:23Z

1

You can use 'str.contains' method to get the string which exact substring

df[df["column_name"].str.contains("book")].values[0][0]

answered Apr 26, 2021 at 16:20

Zalak Bhalani

1,1449 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

lolliesaurus · Accepted Answer · 2021-04-26 16:28:42Z

1

I would use rapidfuzz:

from rapidfuzz import process, fuzz

df = pd.DataFrame(['wijk 00 book', 'Wijk a', 'test'], columns=['col1'])

search_str = 'book'
most_similar = process.extractOne(search_str, df['col1'], scorer=fuzz.WRatio)

Output:

most_similar
('wijk 00 book', 90.0, 0)

This gives you the most similar string in the column as well as a score for how similar it is to your search string.

answered Apr 26, 2021 at 16:28

lolliesaurus

384 bronze badges

1 Comment

tdy Over a year ago

nice +1, much faster than my version using rapidfuzz with apply()

Collectives™ on Stack Overflow

Get most similar value from dataframe column to specific string python

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related