0

I'm trying to store some data that's scraped from a website. That urls are more than 100+ and similar each other. Because of that i tried to use something with %s tag in my code.

My e.g urls:

https://www.yahoo.com/lifestyle/tagged/food,
https://www.yahoo.com/lifestyle/tagged/sports,
https://www.yahoo.com/lifestyle/tagged/usa,
https://www.yahoo.com/lifestyle/tagged/health and goes on..

My Django+Bs4 Loop:

from django.core.management.base import BaseCommand
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from scraping.models import Job
import requests as req


header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}

class Command(BaseCommand):
    def handle(self,  *args, **options):
        TAGS = ['economy', 'food', 'sports', 'usa', 'health']
        resp = req.get('https://www.yahoo.com/lifestyle/tagged/%s' % (TAGS),headers=header)
        soup = BeautifulSoup(resp.text, 'lxml')

        for i in range(len(soup)):
            titles = soup.findAll("div", {"class": "StretchedBox Z(1)"})
            
        print (titles)

Error message is:

TypeError: not all arguments converted during string formatting

I have been playing around with loops but am very new to this and am unable to work out how to loop it. What am I missing here? Can someone more knowledgeable point me in the right direction? Many thanks

1
  • What do you expect 'https://www.yahoo.com/lifestyle/tagged/%s' % (TAGS) to do given that TAGS is a list of strings? You obviously want to insert each of the values in TAGS individually and perform a request for each of them. But this line is not in a loop, so how do you expect it to make multiple requests? Commented Feb 3, 2021 at 17:01

2 Answers 2

1

You can loop through your tags to send a request for each tag.

header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}

TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
    resp = requests.get(f"https://www.yahoo.com/lifestyle/tagged/{tag}", headers=header)
    print(len(resp.text))

#341723
#442712
#447413
#368508
#445326
Sign up to request clarification or add additional context in comments.

Comments

1

It appears that you want to insert each of the values in TAGS individually and perform a request for each of them. So you need to loop over TAGS and submit a request for each one. I expect that you want something like this:

TAGS = ['economy', 'food', 'sports', 'usa', 'health']
for tag in TAGS:
    resp = req.get(f'https://www.yahoo.com/lifestyle/tagged/{tag}',headers=header)
    soup = BeautifulSoup(resp.text, 'lxml')
    <process the page>

1 Comment

Note the future: If anyone need something like that. Steve's code also works very well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.