0

I want to select articles based on the Boolean condition as

(unemployment OR inflation) AND (covid19 OR uncertain) AND (tax OR spending OR bank)

I am looking to do it by exact string matching. I have given below codes below. The problem with the current code is that it gets me words as, taxes, taxable, taxpayers for the word "tax" Thanks in advance!!

df = data[['date', 'title', 'body_text']]

def wordestimaor(X):
  
    df['count'] = X.body_text.str.contains("covid19|uncertain")\
     & X.body_text.str.contains("unemployment|inflation")\
     & X.body_text.str.contains("|tax|spending|bank",case = False,regex= True) 
         return X.head(2)
wordestimaor(df)
    

enter image description here

1 Answer 1

1

You need to write it with spaces on left and right, so it will find only full words (like instead of "covid19" you need to write " covid19 ").

But sometimes it can't work, like in this situation: "covid19," (with comma). You need to check these variants too. To do that, a function can be very useful.

symbols = [' ', ',', ';', '!', '?', '.']

def find_word(word):
    for smb in symbols:
        if X.body_text.str.contains(' ' + word + smb):
             return true
    return false

Edit: If the word is standing in the beginning of a sentence, it will start with a capital letter. So you can also check it.

Sign up to request clarification or add additional context in comments.

1 Comment

Yeap I just tried but I found it missing some of the original variables.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.