Beautifulsoup Looping through Variable Url

Question

I'm trying to store some data that's scraped from a website. That urls are more than 100+ and similar each other. Because of that i tried to use something with %s tag in my code.

My e.g urls:

https://www.yahoo.com/lifestyle/tagged/food,
https://www.yahoo.com/lifestyle/tagged/sports,
https://www.yahoo.com/lifestyle/tagged/usa,
https://www.yahoo.com/lifestyle/tagged/health and goes on..

My Django+Bs4 Loop:

from django.core.management.base import BaseCommand
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from scraping.models import Job
import requests as req


header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}

class Command(BaseCommand):
    def handle(self,  *args, **options):
        TAGS = ['economy', 'food', 'sports', 'usa', 'health']
        resp = req.get('https://www.yahoo.com/lifestyle/tagged/%s' % (TAGS),headers=header)
        soup = BeautifulSoup(resp.text, 'lxml')

        for i in range(len(soup)):
            titles = soup.findAll("div", {"class": "StretchedBox Z(1)"})
            
        print (titles)

Error message is:

TypeError: not all arguments converted during string formatting

I have been playing around with loops but am very new to this and am unable to work out how to loop it. What am I missing here? Can someone more knowledgeable point me in the right direction? Many thanks

What do you expect 'https://www.yahoo.com/lifestyle/tagged/%s' % (TAGS) to do given that TAGS is a list of strings? You obviously want to insert each of the values in TAGS individually and perform a request for each of them. But this line is not in a loop, so how do you expect it to make multiple requests? — CryptoFool
– CryptoFool, Commented Feb 3, 2021 at 17:01

Mitchell Olislagers · Accepted Answer · 2021-02-03 17:06:42Z

1

You can loop through your tags to send a request for each tag.

header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}

TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
    resp = requests.get(f"https://www.yahoo.com/lifestyle/tagged/{tag}", headers=header)
    print(len(resp.text))

#341723
#442712
#447413
#368508
#445326

answered Feb 3, 2021 at 17:06

Mitchell Olislagers

1,8271 gold badge6 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

CryptoFool · Accepted Answer · 2021-02-03 17:06:04Z

1

It appears that you want to insert each of the values in TAGS individually and perform a request for each of them. So you need to loop over TAGS and submit a request for each one. I expect that you want something like this:

TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
    resp = req.get(f'https://www.yahoo.com/lifestyle/tagged/{tag}',headers=header)
    soup = BeautifulSoup(resp.text, 'lxml')
    <process the page>

answered Feb 3, 2021 at 17:06

CryptoFool

23.4k5 gold badges31 silver badges55 bronze badges

1 Comment

Olivia Lundy Over a year ago

Note the future: If anyone need something like that. Steve's code also works very well.

Collectives™ on Stack Overflow

Beautifulsoup Looping through Variable Url

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related