0

I'm trying to download an image from a URL using requests. Using browser or a REST client, like restlet chrome extension I can retrieve the normal content, a json, and a binary image that I can save to disk.

Using requests as response result I got almost same response headers, only Content-Length has a different value - 15 bytes instead of 35 kilobytes - and I can't found the binary image.

Trying to simulate the request made by the browser I configure the same request header, like this:

headers = {"Host": "cpom.prefeitura.sp.gov.br",
           "Pragma": "no-cache",
           "Cache-Control": "no-cache",
           "DNT": "1",
           "Accept": "*/*",
           "Accept-Encoding": "gzip, deflate, br",
           "Accept-Language": "en-US,en;q=0.9,pt;q=0.8",
           "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                         "AppleWebKit/537.36 (KHTML, like Gecko) "
                         "Chrome/65.0.3325.181 Safari/537.36"
           }

r = requests.get(url, stream=True, headers=headers)

There's no redirects, I also debug and look the content of requests.model.Response but no success.

What I'm missing? I think that is a detail about the request, but I can't get it.

This my test:

url = "https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=8762520"
r = requests.get(url, stream=True)

if r.status_code == 200:
    print(r.raw.headers)
    with open("/home/bruno/captcha/8762520.txt", "wb") as f:  # saving as text, since is not the png image
        for chunk in r:
            f.write(chunk)

This is the URL to download the image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=4067913

And this the site with the captcha image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral

With a simple GET will get only a json response body, but inspecting the response you'll see the binary response, which is the image - ~36kb size.

EDIT: include images from restlet client

Request: Request sample

Response: Partial response

2
  • 1
    I can only get the 15 byte response with the small JSON, either with Chrome or Postman. Where/how are you getting the response with the image? Commented Apr 20, 2018 at 11:56
  • @jdehesa I included a sample request using restlet. Thanks Commented Apr 20, 2018 at 12:08

1 Answer 1

1

The difference is in the Cookie header. Restlet makes use of existing Chrome's cookies by default (see docs), but if you set the Cookie header to an empty string you will see you do not get the image. I you want to be able to retrieve the image from a Python script, you will need to obtain first a valid cookie making a request to another valid URL in the web app (for example the link with the form that you posted) and look into the Set-Cookie (see MDN docs for more information).

Sign up to request clarification or add additional context in comments.

2 Comments

Hum, I see, restlet sets a Cookie. I will test this approach.
Is that, I'm making an another request, getting the Set-Cookie header and now I can get the image. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.