Python 3: using requests does not get the full content of a web page

Question

I am testing using the requests module to get the content of a webpage. But when I look at the content I see that it does not get the full content of the page.

Here is my code:

import requests
from bs4 import BeautifulSoup

url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

Also on the chrome web-browser if I look at the page source I do not see the full content.

Is there a way to get the full content of the example page that I have provided?

"Also on the chrome web-browser if I look at the page source I do not see the full content." Why do you blame requests then? — Elis Byberi
– Elis Byberi, Commented Dec 9, 2017 at 16:34
The page is probably generated dynamically by javascript running in the browser. This is very common, and there are many questions here on stackoverflow that address this exact issue. — larsks
– larsks, Commented Dec 9, 2017 at 16:40
it's probably like @larsks said , can you tell us more details, what's the missing part of code you can't see it when you show source code in browser ? — Ahmad Nourallah
– Ahmad Nourallah, Commented Dec 9, 2017 at 16:46
@ElisByberi I do not blame requests, I am just saying I am using requests. — TJ1
– TJ1, Commented Dec 9, 2017 at 17:06

Dan-Dev · Accepted Answer · 2017-12-09 16:54:28Z

22

The page is rendered with JavaScript making more requests to fetch additional data. You can fetch the complete page with selenium.

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
print(soup.prettify())

For other solutions see my answer to Scraping Google Finance (BeautifulSoup)

answered Dec 9, 2017 at 16:54

Dan-Dev

9,5783 gold badges42 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

TJ1 Over a year ago

Thanks, when I try to run your code I get this error: FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver'

Dan-Dev Over a year ago

You need to download ChromeDriver and put it in your path sites.google.com/a/chromium.org/chromedriver

Dan-Dev Over a year ago

you can use a headless version of chrome "Chrome Canary" if you are on Windows.

TJ1 Over a year ago

I am on Mac, I copied the chromedriver in the same place that my python source code is but still I get error.

Dan-Dev Over a year ago

It's a long time since I used a Mac on Linux you put it in /usr/local/bin/ is it the same on a Mac?

|

Markus Kaleton · Accepted Answer · 2017-12-09 16:44:46Z

-4

Request is different from getting page source or visual elements of the web page, also viewing source from web page doesn't give you full access to everything that is on the web page including database requests and other back-end stuff. Either your question is not clear enough or you've misinterpreted how web browsing works.

answered Dec 9, 2017 at 16:44

Markus Kaleton

486 bronze badges

1 Comment

Kennet Celeste Over a year ago

this is the BSest response that I've read on stackoverflow in 2020.

Collectives™ on Stack Overflow

Python 3: using requests does not get the full content of a web page

2 Answers 2

12 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

12 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related