Error while uploading picture with the requests library

Question

I'm trying to implement the Yandex OCR translator tool into my code. With the help of Burp Suite, I managed to find that the following request is the one that is used to send the image:

I'm trying to emulate this request with the following code:

import requests
from requests_toolbelt import MultipartEncoder
files={
    'file':("blob",open("image_path", 'rb'),"image/jpeg")
    }

#(<filename>, <file object>, <content type>, <per-part headers>)
burp0_url = "https://translate.yandex.net:443/ocr/v1.1/recognize?srv=tr-image&sid=9b58493f.5c781bd4.7215c0a0&lang=en%2Cru"


m = MultipartEncoder(files, boundary='-----------------------------7652580604126525371226493196')

burp0_headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0", "Accept": "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer": "https://translate.yandex.com/", "Content-Type": "multipart/form-data; boundary=-----------------------------7652580604126525371226493196", "Origin": "https://translate.yandex.com", "DNT": "1", "Connection": "close"}

print(requests.post(burp0_url, headers=burp0_headers, files=m.to_string()).text)

though sadly it yields the following output:

{"error":"BadArgument","description":"Bad argument: file"}

Does anyone know how this could be solved?

Many thanks in advance!

I would not reproduce the boundary for the multipart upload here even. I certainly would not reproduce every single header either. — Martijn Pieters
– Martijn Pieters, Commented Feb 28, 2019 at 20:40
@MartijnPieters Thank you very much for your quick reply. how would you reproduce it then? — Nazim Kerimbekov
– Nazim Kerimbekov, Commented Feb 28, 2019 at 20:44

Martijn Pieters · Accepted Answer · 2019-02-28 20:54:45Z

You are passing the MultipartEncoder.to_string() result to the files parameter. You are now asking requests to encode the result of the multipart encoder to a multipart component. That's one time too many.

You don't need to replicate every byte here, just post the file, and perhaps set the user agent, referer, and origin:

files = {
    'file': ("blob", open("image_path", 'rb'), "image/jpeg")
}

url = "https://translate.yandex.net:443/ocr/v1.1/recognize?srv=tr-image&sid=9b58493f.5c781bd4.7215c0a0&lang=en%2Cru"
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0", 
    "Referer": "https://translate.yandex.com/",
    "Origin": "https://translate.yandex.com",
}

response = requests.post(url, headers=headers, files=files)
print(response.status)
print(response.json())

The Connection header is best left to requests, it can control when a connection should be kept alive just fine. The Accept* headers are there to tell the server what your client can handle, and requests sets those automatically too.

I get a 200 OK response with that code:

200
{'data': {'blocks': []}, 'status': 'success'}

However, if you don't set additional headers (remove the headers=headers argument), the request also works, so Yandex doesn't appear to be filtering for robots here.

Collectives™ on Stack Overflow

Error while uploading picture with the requests library

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related