1

I use Python requests to get images, but in some case sit doesn't work. It seems to happen more often. An example is

http://recipes.thetasteofaussie.netdna-cdn.com/wp-content/uploads/2015/07/Leek-and-Sweet-Potato-Gratin.jpg

It loads fine in my browser, but using requests, it returns html that says "403 forbidden" and "nginx/1.7.11"

import requests
image_url = "<the_url>"
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Encoding':'gzip,deflate,sdch'}
r = requests.get(image_url, headers=headers)
# r.content is html '403 forbidden', not an image

I have also tried with this header, which has been necessary in some cases. Same result.

headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Accept':'image/webp,*/*;q=0.8','Accept-Encoding':'gzip,deflate,sdch'}

(I had a similar question a few weeks ago, but this was answered by the particular image file types not being supported by PIL. This is different.)

EDIT: Based on comments:

It seems the link only works if you have already visited the original site http://aussietaste.recipes/vegetables/leek-vegetables/leek-and-sweet-potato-gratin/ with the image. I suppose the browser then uses the cached version. Any workaround?

4
  • 2
    I get 403 forbidden when I try and click your link... most likely there is some smarts on the server end trying to prevent others from serving their images (ie it probably works fine when you navigate to it using their site... but not directly linking) Commented Jul 22, 2015 at 21:24
  • i can not access the image through mozilla firefox browser. maybe the direct access from external sources is forbidden. can you provide the blog article which includes this image? Commented Jul 22, 2015 at 21:26
  • its here aussietaste.recipes/vegetables/leek-vegetables/… ... i Commented Jul 22, 2015 at 21:36
  • Ah, yes. It seems the link only works if I have already visited the site. I suppose the browser then uses the cached version. Any workaround? Commented Jul 22, 2015 at 21:40

1 Answer 1

6

The site is validating the Referer header. This prevents other sites from including the image in their web pages and using the image host's bandwidth. Set it to the site you mentioned in your post, and it will work.

More info: https://en.wikipedia.org/wiki/HTTP_referer

import requests
image_url = "http://recipes.thetasteofaussie.netdna-cdn.com/wp-content/uploads/2015/07/Leek-and-Sweet-Potato-Gratin.jpg"
headers = {
    'User-agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36',
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Encoding' : 'gzip,deflate,sdch',
    'Referer' : 'http://aussietaste.recipes/vegetables/leek-vegetables/leek-and-sweet-potato-gratin/'
}
r = requests.get(image_url, headers=headers)
print r

For me, this prints

<Response [200]>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.