0

I have a dataframe with two columns

Current Dataframe

SE#     Response                                                                COVID Words Mentioned
123456  As a merchant I appreciated your efforts in pricing with Covid-19       
456789  you guys suck and didn't handle our relationship during this pandemic   
347896  I love your company                                                     

Desired Dataframe

SE#     Response                                                                COVID Words Mentioned
123456  As a merchant I appreciated your efforts in pricing with Covid-19       Y
456789  you guys suck and didn't handle our relationship during this pandemic   Y
347896  I love your company                                                     N

terms = ['virus', 'Covid-19','covid19','flu','covid','corona','Corona','COVID-19','co-vid19','Coronavirus','Corona Virus','COVID','purell','pandemic','epidemic','coronaviru','China','Chinese','chinese','crona','korona']

These are the list of strings that need to be checked in each response. The goal is to be able to add or remove elements from the list of terms.

The above are just examples of records. I have a list of strings related to covid-19 that need to be searched in each response. If any of the strings exist, in the 'COVID Words Mentioned' column, mark a "Y" and "N" if the words do not show up.

How do I code this in python?

Much appreciated!

1 Answer 1

1

For each search term, set up a result vector:

d = {}
for i in LIST_OF_STRINGS:
    d[i] = df['response'].str.contains(i, na=False)

I pass na=False because otherwise, Pandas will fill NA in cases where the string column is itself NA. We don't want that behaviour. The complexity of the operation increases rapidly with the number of search terms. Also consider changing this function if you want to match whole words, because contains matches sub-strings.

Regardless, take results and reduce them with bit-wise and. You need two imports:

from functools import reduce
from operator import and_

df[reduce(and_, d.values())]

The final line there selects the only elements with any of the words. You could alternatively try mapping the output of the reduction from {True, False} to {'Y', 'N'} using np.where.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.