13

I'm learning how to use python requests (Python 3) and I am trying to make a simple requests.get to get the HTML code from several websites. Although it works for most of them, there is one I am having trouble with.

When I call : http://es.rs-online.com/ everything works fine:

In [1]: import requests
   ...:html = requests.get("http://es.rs-online.com/")
In [2]:html
Out[2]: <Response [200]>

However, when I try it with http://es.farnell.com/, python is unable to solve the address and keeps working on it forever. If I set a timeout, no matter how long, the requests.get() will always be interrupted by the timeout and by nothing else. I have also tried adding headers but it didn't solve the issue. Also, I don't think the error has anything to do with the proxy that I'm using, as I am able to open this website in my browser. Currently, my code looks like this:

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
html = requests.get("http://es.farnell.com/",headers=headers, timeout=5, allow_redirects = True )

After 5 secs, I get the expected timeout notification.

ReadTimeout: HTTPConnectionPool(host='es.farnell.com', port=80): Read timed out. (read timeout=5)

Does anyone know what could be the issue?

2 Answers 2

33

The problem is in your header. Do remember that some site are more lenient than others when it comes to the content of the header you are sending. In order to fix the issue, you should replace your current header with:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}

I would also recommend you to send the get request to https://es.farnell.com/ rather than http://es.farnell.com/, remove the timeout = 5 and remove allow_redirects = True (as it is True by default).


All in all your code should look like this:

import requests


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
html = requests.get("https://es.farnell.com",headers=headers)

hope this helps.

Sign up to request clarification or add additional context in comments.

2 Comments

I am facing a similar issue with website: "hamburgsud-line.com/liner/en/liner_services/index.html". I need to get cookie data so that I can use it while making an API call to get tracking data. Any help will be much appreciated. TIA!
@JuhiSharma Have you tried using request.session() ?
-1
import requests
import time

def send_requests(url, total_requests=200, delay=0.5):
    success = 0
    failed = 0

    for i in range(total_requests):
        try:
            response = requests.get(url, timeout=10)
            response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)

            print(f"[{i+1}] Success: {response.status_code}")
            success += 1

        except requests.exceptions.RequestException as e:
            print(f"[{i+1}] Failed: {e}")
            failed += 1

        time.sleep(delay)  # polite delay to avoid spamming too fast

    print(f"\nFinished:\n  Successful: {success}\n  Failed: {failed}")

# Example usage
url = "https://example.com"  # Replace with the URL you're legally allowed to monitor
send_requests(url)

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.