I have a dataframe that looks like this. It has 1 column labeled 'utterances'. df.utterances contains rows whose values are strings of n number words.
utterances
0 okay go ahead.
1 Um, let me think.
2 nan that's not very encouraging. If they had a...
3 they wouldn't make you want to do it. nan nan ...
4 Yeah. The problem is though, it just, if we pu...
I also have a list of specific words. It is called specific_words. It looks like this:
specific_words = ['happy, 'good', 'encouraging', 'joyful']
I want to check if any of the words from specific_words are found in any of the utterances. Essentially, I want to loop throughevery row in df.utterance, and when I do so, loop through specific_list to look for matches. If there is a match, I want to have a boolean column next to df.utterances that shows this.
def query_text_by_keyword(df, word_list):
for word in word_list:
for utt in df.utterance:
if word in utt:
match = True
else:
match = False
return match
df['query_match'] = df.apply(query_text_by_keyword,
axis=1,
args=(specific_words,))
It doesn't break, but it just returns False for every row, when it shouldn't. For example, the first few rows should look like this:
utterances query_match
0 okay go ahead. False
1 Um, let me think. False
2 nan that's not very encouraging. If they had a... True
3 they wouldn't make you want to do it. nan nan ... False
4 Yeah. The problem is though, it just, if we pu... False
Edit
@furas made a great suggestion to solve the initial question. However, I would also like to add another column that contains the specific word(s) from the query that indicates a match. Example:
utterances query_match word
0 okay go ahead False NaN
1 Um, let me think False NaN
2 nan that's not very encouraging. If they had a.. True 'encouraging'
3 they wouldn't make you want to do it. nan nan .. False NaN
4 Yeah. The problem is though, it just, if we pu.. False NaN
df.str.constains("happy|good|encouraging|joyful")? And"|".join(specific_words)to create this regex.