1

I am just started learning web scraping using python Beautifulsoup and requests library and using Pycharm tool.

import requests
from bs4 import BeautifulSoup
    
result1 = requests.get("https://www.grainger.com/")
print('result1 is '+ str(result1.status_code))

While I am using this website its keeps on loading and if I use google.com it's giving output.

I wonder why I didn't get output for the above website?

2 Answers 2

1

To get status 200 from this site, specify User-Agent HTTP header:

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'}

result1 = requests.get("https://www.grainger.com/", headers=headers)

print('result1 is '+ str(result1.status_code))

Prints:

result1 is 200

The reason why this is works is because some sites will ignore requests that don't appear to be made from a web browser. By default, requests uses the User-Agent python-requests, so the website can tell you are not requesting the website from a web browser. The reason why your request hangs and eventually times out is likely because their server is ignoring your request.

Sign up to request clarification or add additional context in comments.

Comments

0

Hmm... there are a couple of things.

  1. The website might not exist
  2. You're using http instead of https
  3. That site blocks scraping (send a user agent header)
  4. It might be a problem with requests. Try using a different library.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.