Image url does not return an image. Using Python requests

Question

I use Python requests to get images, but in some case sit doesn't work. It seems to happen more often. An example is

http://recipes.thetasteofaussie.netdna-cdn.com/wp-content/uploads/2015/07/Leek-and-Sweet-Potato-Gratin.jpg

It loads fine in my browser, but using requests, it returns html that says "403 forbidden" and "nginx/1.7.11"

import requests
image_url = "<the_url>"
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Encoding':'gzip,deflate,sdch'}
r = requests.get(image_url, headers=headers)
# r.content is html '403 forbidden', not an image

I have also tried with this header, which has been necessary in some cases. Same result.

headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Accept':'image/webp,*/*;q=0.8','Accept-Encoding':'gzip,deflate,sdch'}

(I had a similar question a few weeks ago, but this was answered by the particular image file types not being supported by PIL. This is different.)

EDIT: Based on comments:

It seems the link only works if you have already visited the original site http://aussietaste.recipes/vegetables/leek-vegetables/leek-and-sweet-potato-gratin/ with the image. I suppose the browser then uses the cached version. Any workaround?

I get 403 forbidden when I try and click your link... most likely there is some smarts on the server end trying to prevent others from serving their images (ie it probably works fine when you navigate to it using their site... but not directly linking) — Joran Beasley
– Joran Beasley, Commented Jul 22, 2015 at 21:24
i can not access the image through mozilla firefox browser. maybe the direct access from external sources is forbidden. can you provide the blog article which includes this image? — marmeladze
– marmeladze, Commented Jul 22, 2015 at 21:26
its here aussietaste.recipes/vegetables/leek-vegetables/… ... i — Joran Beasley
– Joran Beasley, Commented Jul 22, 2015 at 21:36
Ah, yes. It seems the link only works if I have already visited the site. I suppose the browser then uses the cached version. Any workaround? — user984003
– user984003, Commented Jul 22, 2015 at 21:40

mooiamaduck · Accepted Answer · 2015-07-22 21:50:02Z

The site is validating the Referer header. This prevents other sites from including the image in their web pages and using the image host's bandwidth. Set it to the site you mentioned in your post, and it will work.

More info: https://en.wikipedia.org/wiki/HTTP_referer

import requests
image_url = "http://recipes.thetasteofaussie.netdna-cdn.com/wp-content/uploads/2015/07/Leek-and-Sweet-Potato-Gratin.jpg"
headers = {
    'User-agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36',
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Encoding' : 'gzip,deflate,sdch',
    'Referer' : 'http://aussietaste.recipes/vegetables/leek-vegetables/leek-and-sweet-potato-gratin/'
}
r = requests.get(image_url, headers=headers)
print r

For me, this prints

<Response [200]>

Collectives™ on Stack Overflow

Image url does not return an image. Using Python requests

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related