Different response using Python requests

Question

I'm trying to download an image from a URL using requests. Using browser or a REST client, like restlet chrome extension I can retrieve the normal content, a json, and a binary image that I can save to disk.

Using requests as response result I got almost same response headers, only Content-Length has a different value - 15 bytes instead of 35 kilobytes - and I can't found the binary image.

Trying to simulate the request made by the browser I configure the same request header, like this:

headers = {"Host": "cpom.prefeitura.sp.gov.br",
           "Pragma": "no-cache",
           "Cache-Control": "no-cache",
           "DNT": "1",
           "Accept": "*/*",
           "Accept-Encoding": "gzip, deflate, br",
           "Accept-Language": "en-US,en;q=0.9,pt;q=0.8",
           "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                         "AppleWebKit/537.36 (KHTML, like Gecko) "
                         "Chrome/65.0.3325.181 Safari/537.36"
           }

r = requests.get(url, stream=True, headers=headers)

There's no redirects, I also debug and look the content of requests.model.Response but no success.

What I'm missing? I think that is a detail about the request, but I can't get it.

This my test:

url = "https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=8762520"
r = requests.get(url, stream=True)

if r.status_code == 200:
    print(r.raw.headers)
    with open("/home/bruno/captcha/8762520.txt", "wb") as f:  # saving as text, since is not the png image
        for chunk in r:
            f.write(chunk)

This is the URL to download the image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=4067913

And this the site with the captcha image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral

With a simple GET will get only a json response body, but inspecting the response you'll see the binary response, which is the image - ~36kb size.

EDIT: include images from restlet client

Request:

Response:

I can only get the 15 byte response with the small JSON, either with Chrome or Postman. Where/how are you getting the response with the image? — javidcf
– javidcf, Commented Apr 20, 2018 at 11:56

javidcf · Accepted Answer · 2018-04-20 12:29:27Z

1

The difference is in the Cookie header. Restlet makes use of existing Chrome's cookies by default (see docs), but if you set the Cookie header to an empty string you will see you do not get the image. I you want to be able to retrieve the image from a Python script, you will need to obtain first a valid cookie making a request to another valid URL in the web app (for example the link with the form that you posted) and look into the Set-Cookie (see MDN docs for more information).

answered Apr 20, 2018 at 12:29

javidcf

59.9k7 gold badges87 silver badges134 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bruno Ribeiro Over a year ago

Hum, I see, restlet sets a Cookie. I will test this approach.

Bruno Ribeiro Over a year ago

Is that, I'm making an another request, getting the Set-Cookie header and now I can get the image. Thanks

Collectives™ on Stack Overflow

Different response using Python requests

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related