Check wether words from a list are inside a string of another list Python

Question

So I tried getting all the headlines of the New York Times homepage and wanted to see how many times a certain word has been mentioned. In this particular case, I wanted to see how many headlines mentioned either the Coronavirus or Trump. This is my code but it won't work as 'number' remains the integer I give it before the while loop.

import requests
from bs4 import BeautifulSoup

url = 'https://www.nytimes.com'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
a = soup.findAll("h2", class_="esl82me0")

for story_heading in a:
    print(story_heading.contents[0])

lijst = ["trump", "Trump", "Corona", "COVID", "virus", "Virus", "Coronavirus", "COVID-19"]
number = 0
run = 0

while run < len(a)+1:
    run += 1
     if any(lijst in s for s in a)
        number += 1

print("\nTrump or the Corona virus have been mentioned", number, "times.")

So I basically want the variable 'number' to increase by 1 if a headline (which is an entry in the list a) has the word Trump or Coronavirus or both in them.

Does anyone know how to do this?

This doesn't count as an answer, given that I'm not giving you a complete solution, but typically you would do the following: 1. Fetch the contents. 2. Cast all text to lowercase so that the matching can be efficient. 3. Tokenize the text into individual entities. Good options are SpaCy and NLTK. 4. A question of counting and sorting. collections.Counter would do the trick for you. — Alexander Ejbekov
– Alexander Ejbekov, Commented Apr 12, 2020 at 18:36

Kent Shikama · Accepted Answer · 2020-04-12 19:09:42Z

1

In general, I recommend putting more thought into naming variables. I like how you tried to print the story headings. The line if any(lijst in s for s in a) does not do what you think it should: you need to instead iterate over each word in a single h2. The any function is just a short hand for the following:

def any(iterable):
    for element in iterable:
        if element:
            return True
    return False

In other words, you're trying to see if an entire list is in an h2 element, which will never be true. Here is an example fix.

import requests
from bs4 import BeautifulSoup

url = 'https://www.nytimes.com'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
h2s = soup.findAll("h2", class_="esl82me0")

for story_heading in h2s:
    print(story_heading.contents[0])

keywords = ["trump", "Trump", "Corona", "COVID", "virus", "Virus", "Coronavirus", "COVID-19"]
number = 0
run = 0

for h2 in h2s:
    headline = h2.text
    words_in_headline = headline.split(" ")
    for word in words_in_headline:
        if word in keywords:
            number += 1
print("\nTrump or the Corona virus have been mentioned", number, "times.")

Output

Trump or the Corona virus have been mentioned 7 times.

answered Apr 12, 2020 at 19:09

Kent Shikama

4,0903 gold badges27 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jem Over a year ago

It's so satisfying to see the logic behind code and now I finally understand what I did wrong. Thank you so much! :-)

Collectives™ on Stack Overflow

Check wether words from a list are inside a string of another list Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related