Scraping Website does not return correct source code

Question

im trying to scrape a quizlet match set with Python. I want to scrape all the <span> tags with class: TermText

Here's the URL: 'https://quizlet.com/291523268'

import requests
raw = requests.get(URL).text

raw ends up returning things that do not contain any tags or cards at all. When I check the source of the website it shows all the TermText spans that I need meaning it's not JS loaded. Thus, I don't understand why my HTML is coming out wrong since it doesn't contain any of the html I need.

Andrej Kesely · Accepted Answer · 2020-07-30 22:40:29Z

2

To get correct response from server, set correct User-Agent HTTP header:

import requests
from bs4 import BeautifulSoup


url = 'https://quizlet.com/291523268/python-flash-cards/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0'}

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')

for span in soup.select('span.TermText'):
    print(span.get_text(strip=True))

Prints:

algorithm
A set of specific steps for solving a category of problems
token
basic elements of a language(letters, numbers, symbols)
high-level language
A programming language like Python that is designed to be easy for humans to read and write.
low-level langauge

...and so on.

answered Jul 30, 2020 at 22:40

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AaravM4 Over a year ago

Why was it that you needed to send the User-Agent @Andrej Kesely

Andrej Kesely Over a year ago

@AaravM4 Without User-Agent you get Clouflare captcha page. I set User-Agent as first thing when I get these types of pages from server.

Collectives™ on Stack Overflow

Scraping Website does not return correct source code

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related