0
joined_Gravity1.head()
Comments
____________________________________________________
0   Why the old Pike/Lyrik?
1   This is good
2   So clean
3   Looks like a Decoy
Input: type(joined_Gravity1)
Output: pandas.core.frame.DataFrame

The following code allows me to select strings that contain keywords: "ender"

joined_Gravity1[joined_Gravity1["Comments"].str.contains("ender", na=False)]

Output:

Comments
___________________________
194 We need a new Sender 😂
7   What about the sender
179 what about the sender?😏

How to revise the code to include words similar to 'Sender' such as 'snder','bnder'?

3 Answers 3

1

I don't see a reason why regex=True inside the contains function won't work here.

joined_Gravity1[joined_Gravity1["Comments"].str.contains(pat="ender|snder|bndr", na=False, regex=True)]

I have used "ender|snder|bnder" only. You can make a list of all such words say list_words, and pass in pat='|'.join(list_words) in contains function above.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html

Sign up to request clarification or add additional context in comments.

Comments

1

There can be a massive number of possibilities that can occur with combinations of alphabets in such words. What you are trying to do is a fuzzy match between 2 string. I can recommend using the following -

#!pip install fuzzywuzzy
from fuzzywuzzy import fuzz, process

word = 'sender'
others = ['bnder', 'snder', 'sender', 'hello']

process.extractBests(word, others)
[('sender', 100), ('snder', 91), ('bnder', 73), ('hello', 18)]

Based on this you can decide which threshold to choose and then mark the ones that are above the threshold as a match (using the code you used above)

Here is a method to do this in your exact problem statement with a function -

df = pd.DataFrame(['hi there i am a sender', 
                   'I dont wanna be a bnder', 
                   'can i be the snder?', 
                   'i think i am a nerd'], columns=['text'])

#s = sentence, w = match word, t = match threshold
def get_match(s,w,t):
    ss = process.extractBests(w,s.split())
    return any([i[1]>t for i in ss])

#What its doing - Match each word in each row in df.text with 
#the word sender and see of any of the words have a match greater 
#than threshold ratio 70.
df['match'] = df['text'].apply(get_match, w='sender', t=70)
print(df)

                      text  match
0   hi there i am a sender   True
1  I dont wanna be a bnder   True
2      can i be the snder?   True
3      i think i am a nerd  False

Tweek the t value from 70 to 80 if you want more exact match or lower for more relaxed match.

Finally you can filter it out -

df[df['match']==True][['text']]
                      text
0   hi there i am a sender
1  I dont wanna be a bnder
2      can i be the snder?

3 Comments

df['match'] = df['text'].apply(get_match, w='sender', t=70) Is it possible to include several words instead of just 1 word in the position w? I tried the following: 1. df['match'] = df['text'].apply(get_match, w=('sender','slx'), t=70) 2. df['match'] = df['text'].apply(get_match, w=['sender','slx'], t=70) 3. w = ['sender','slx','clx'] df['match'] = df['text'].apply(get_match, w, t=70) Neither of the three works. 'Sender' here is the product category that can further be broken down into product types.
I am not sure what you need here. Do you want to separate the sentences by Sender or Slx? or you want sentences which have BOTH sender and Slx part of the sentence?
Also, it wont match ofcourse because the fuzzywuzzy documentation clearly says that it uses the target word to match against a list of choices.. It doesnt match a list of word against a list of choices. You can easily modify the funciton to operate over a list of words instead of 1
-1
from difflib import get_close_matches 

def closeMatches(patterns, word): 
     print(get_close_matches(word, patterns)) 

 list_patterns = joined_Gravity1[joined_Gravity1["Comments"].str.contains("ender", na=False)]

 word = 'Sender'
 patterns = list_patterns
 closeMatches(patterns, word) 

1 Comment

This does not achieve what OP is asking.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.