I'm trying to download an image from a URL using requests. Using browser or a REST client, like restlet chrome extension I can retrieve the normal content, a json, and a binary image that I can save to disk.
Using requests as response result I got almost same response headers, only Content-Length has a different value - 15 bytes instead of 35 kilobytes - and I can't found the binary image.
Trying to simulate the request made by the browser I configure the same request header, like this:
headers = {"Host": "cpom.prefeitura.sp.gov.br",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"DNT": "1",
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9,pt;q=0.8",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/65.0.3325.181 Safari/537.36"
}
r = requests.get(url, stream=True, headers=headers)
There's no redirects, I also debug and look the content of requests.model.Response but no success.
What I'm missing? I think that is a detail about the request, but I can't get it.
This my test:
url = "https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=8762520"
r = requests.get(url, stream=True)
if r.status_code == 200:
print(r.raw.headers)
with open("/home/bruno/captcha/8762520.txt", "wb") as f: # saving as text, since is not the png image
for chunk in r:
f.write(chunk)
This is the URL to download the image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=4067913
And this the site with the captcha image: https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral
With a simple GET will get only a json response body, but inspecting the response you'll see the binary response, which is the image - ~36kb size.
EDIT: include images from restlet client

