I'm trying to store some data that's scraped from a website. That urls are more than 100+ and similar each other. Because of that i tried to use something with %s tag in my code.
My e.g urls:
https://www.yahoo.com/lifestyle/tagged/food,
https://www.yahoo.com/lifestyle/tagged/sports,
https://www.yahoo.com/lifestyle/tagged/usa,
https://www.yahoo.com/lifestyle/tagged/health and goes on..
My Django+Bs4 Loop:
from django.core.management.base import BaseCommand
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from scraping.models import Job
import requests as req
header = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0'}
class Command(BaseCommand):
def handle(self, *args, **options):
TAGS = ['economy', 'food', 'sports', 'usa', 'health']
resp = req.get('https://www.yahoo.com/lifestyle/tagged/%s' % (TAGS),headers=header)
soup = BeautifulSoup(resp.text, 'lxml')
for i in range(len(soup)):
titles = soup.findAll("div", {"class": "StretchedBox Z(1)"})
print (titles)
Error message is:
TypeError: not all arguments converted during string formatting
I have been playing around with loops but am very new to this and am unable to work out how to loop it. What am I missing here? Can someone more knowledgeable point me in the right direction? Many thanks
'https://www.yahoo.com/lifestyle/tagged/%s' % (TAGS)to do given thatTAGSis a list of strings? You obviously want to insert each of the values in TAGS individually and perform a request for each of them. But this line is not in a loop, so how do you expect it to make multiple requests?