TypeError: 'int' object is not iterable while Webscraping with BeatifulSoup

Question

Im trying to webscrape all snapple facts on https://www.snapple.com/real-facts right now, and since I didnt find anything useful online, I decided to write my own script

from bs4 import BeautifulSoup as soup
import requests

data = requests.get('https://www.snapple.com/real-facts')

result_list = []
soup = soup(data.text, 'html.parser')
divs = soup.find("div", {'id':'facts'})
for div in divs:
    fact_li = div.find('li')
    for fact in fact_li:
        spans = fact.find('span', {'class':'description'})
        for span in spans:
            a = fact.find('a')
            result_list.append(a)

print(result_list)

when I run this it returns:

 Traceback (most recent call last):
  File "snapplefactscrape.py", line 11, in <module>
    for fact in fact_li:
TypeError: 'int' object is not iterable

I get what that means, but I dont understand why the fact_li is an int, and how I can prevent it from being one.

Help would be appreciated :)

Sers · Accepted Answer · 2020-01-19 11:04:12Z

1

To get all elements use find_all instead of find.

You don't need to use 3 loops to get all links, using select with #facts .description a css selector will give you them:

base_url = 'https://www.snapple.com'
data = requests.get(f'{base_url}/real-facts')
soup = soup(data.text, 'html.parser')

links = soup.select('#facts .description a')
for link in links:
    print(link.text, base_url + link['href'])

But if you want to use loops:

divs = soup.find_all('div', {'id': 'facts'})
for div in divs:
    fact_li = div.find_all('li')
    for fact in fact_li:
        spans = fact.find_all('span', {'class': 'description'})
        for span in spans:
            a = fact.find_all('a')
            result_list.append(a)

edited Jan 19, 2020 at 11:04

answered Jan 19, 2020 at 10:47

Sers

12.3k2 gold badges14 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Darki Over a year ago

The first solution worked perfectly! Now if I print the facts it looks like this tho: <a href="/real-facts/9">The average speed of a housefly is 4.5 mph.</a> How can I parse this so it gets the number from the href (9 in this case) a comma and after that the fact? Thanks a lot for helping

Sers Over a year ago

Every link in links is a Tagobject. In the answer update you can find how to get text and href attribute. What to do if someone answers my question?

Felix Kleine Bösing · Accepted Answer · 2020-01-19 10:47:24Z

1

When iterating for div in divs: div becomes a string. So instead of the bs4 find method on tags, you´re using the find method on strings, which returns -1 if the substring is not found.

IN the first iteration for example, the value of div is "\n". This would be a good example for using a debugger to check the value of variables. Or even use print for value and type checks.

answered Jan 19, 2020 at 10:47

Felix Kleine Bösing

6053 silver badges13 bronze badges

Collectives™ on Stack Overflow

TypeError: 'int' object is not iterable while Webscraping with BeatifulSoup

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related