4

I am using the following code:

import requests
url = 'http://www.transfermarkt.com/'
r = requests.get(url)
r.raise_for_status()

And I have the following output:

HTTPError: 404 Client Error: Not Found for url: http://www.transfermarkt.com/

But the link works normally from the browser. Why is this happening?

1 Answer 1

9

The site administrator has decided that the site should pretend to not exist to clients that do not share their User-Agent in their headers:

>>> import requests
>>> url = 'http://www.transfermarkt.com/'
>>> requests.get(url).raise_for_status()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 831, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found

Breaks as you've found out. Set a user agent:

>>> headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:39.0)'}
>>> requests.get(url, headers=headers).raise_for_status()
>>>

and you're good.

It seems like the site admin doesn't want you to do this, so perhaps you could ask for permission or ask if there's a preferred way to get the content, but not having a user-agent set was the technical reason.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.