0

Im trying to webscrape all snapple facts on https://www.snapple.com/real-facts right now, and since I didnt find anything useful online, I decided to write my own script

from bs4 import BeautifulSoup as soup
import requests

data = requests.get('https://www.snapple.com/real-facts')

result_list = []
soup = soup(data.text, 'html.parser')
divs = soup.find("div", {'id':'facts'})
for div in divs:
    fact_li = div.find('li')
    for fact in fact_li:
        spans = fact.find('span', {'class':'description'})
        for span in spans:
            a = fact.find('a')
            result_list.append(a)

print(result_list)

when I run this it returns:

 Traceback (most recent call last):
  File "snapplefactscrape.py", line 11, in <module>
    for fact in fact_li:
TypeError: 'int' object is not iterable

I get what that means, but I dont understand why the fact_li is an int, and how I can prevent it from being one.

Help would be appreciated :)

2 Answers 2

1

To get all elements use find_all instead of find.

You don't need to use 3 loops to get all links, using select with #facts .description a css selector will give you them:

base_url = 'https://www.snapple.com'
data = requests.get(f'{base_url}/real-facts')
soup = soup(data.text, 'html.parser')

links = soup.select('#facts .description a')
for link in links:
    print(link.text, base_url + link['href'])

But if you want to use loops:

divs = soup.find_all('div', {'id': 'facts'})
for div in divs:
    fact_li = div.find_all('li')
    for fact in fact_li:
        spans = fact.find_all('span', {'class': 'description'})
        for span in spans:
            a = fact.find_all('a')
            result_list.append(a)
Sign up to request clarification or add additional context in comments.

2 Comments

The first solution worked perfectly! Now if I print the facts it looks like this tho: <a href="/real-facts/9">The average speed of a housefly is 4.5 mph.</a> How can I parse this so it gets the number from the href (9 in this case) a comma and after that the fact? Thanks a lot for helping
Every link in links is a Tagobject. In the answer update you can find how to get text and href attribute. What to do if someone answers my question?
1

When iterating for div in divs: div becomes a string. So instead of the bs4 find method on tags, you´re using the find method on strings, which returns -1 if the substring is not found.

IN the first iteration for example, the value of div is "\n". This would be a good example for using a debugger to check the value of variables. Or even use print for value and type checks.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.