16

I am testing using the requests module to get the content of a webpage. But when I look at the content I see that it does not get the full content of the page.

Here is my code:

import requests
from bs4 import BeautifulSoup

url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
page = requests.get(url)

soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

Also on the chrome web-browser if I look at the page source I do not see the full content.

Is there a way to get the full content of the example page that I have provided?

4
  • 5
    "Also on the chrome web-browser if I look at the page source I do not see the full content." Why do you blame requests then? Commented Dec 9, 2017 at 16:34
  • The page is probably generated dynamically by javascript running in the browser. This is very common, and there are many questions here on stackoverflow that address this exact issue. Commented Dec 9, 2017 at 16:40
  • it's probably like @larsks said , can you tell us more details, what's the missing part of code you can't see it when you show source code in browser ? Commented Dec 9, 2017 at 16:46
  • @ElisByberi I do not blame requests, I am just saying I am using requests. Commented Dec 9, 2017 at 17:06

2 Answers 2

22

The page is rendered with JavaScript making more requests to fetch additional data. You can fetch the complete page with selenium.

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
url = "https://shop.nordstrom.com/c/womens-dresses-shop?origin=topnav&cm_sp=Top%20Navigation-_-Women-_-Dresses&offset=11&page=3&top=72"
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
print(soup.prettify())

For other solutions see my answer to Scraping Google Finance (BeautifulSoup)

Sign up to request clarification or add additional context in comments.

12 Comments

Thanks, when I try to run your code I get this error: FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver'
You need to download ChromeDriver and put it in your path sites.google.com/a/chromium.org/chromedriver
you can use a headless version of chrome "Chrome Canary" if you are on Windows.
I am on Mac, I copied the chromedriver in the same place that my python source code is but still I get error.
It's a long time since I used a Mac on Linux you put it in /usr/local/bin/ is it the same on a Mac?
|
-4

Request is different from getting page source or visual elements of the web page, also viewing source from web page doesn't give you full access to everything that is on the web page including database requests and other back-end stuff. Either your question is not clear enough or you've misinterpreted how web browsing works.

1 Comment

this is the BSest response that I've read on stackoverflow in 2020.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.