10

with fhe following code

payload = '''
 工作报告 
 总体情况:良好 
'''
r = requests.post("http://httpbin.org/post", data=payload)

what is the default encoding when Requests post data is string type? UTF8 or unicode-escape?

if I like to specify a encoding type, do I have to encode it myself and pass a bytes object to parameter 'data'?

4
  • do you get error message when you run it ? I got error message about encoding in Latin-1. But I don't have problem when I encode it manually payload = "text".encode('utf-8') Commented Apr 28, 2019 at 7:24
  • 4
    The docs say data should be "Dictionary, list of tuples, bytes, or file-like object". Strings aren't a documented option. Commented Apr 28, 2019 at 7:41
  • That said, the code seems to accept Unicode strings, and I haven't tracked down what kind of encoding gets applied or where the encoding happens if you pass in a Unicode string. It seems to be unencoded all the way down to where it gets passed to urllib3, and I'm not sure what urllib3 does with the data. Commented Apr 28, 2019 at 7:57
  • Possible duplicate of stackoverflow.com/questions/708915/… Commented Apr 28, 2019 at 8:41

3 Answers 3

12

As per latest JSON spec (RFC-8259) when using external services you must encode your JSON payloads as UTF-8. Here is a quick solution:

r = requests.post("http://httpbin.org/post", data=payload.encode('utf-8'))

requests uses httplib which defaults to latin-1 encoding. Byte arrays aren't automatically encoded so it is always better encode your text data yourself and use a bytearray.

I'd also recommend to set the charset using the headers parameter:

r = requests.post("http://httpbin.org/post", data=payload.encode('utf-8'),
                  headers={'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8'})
Sign up to request clarification or add additional context in comments.

1 Comment

"Byte arrays aren't automatically encoded" => well, that’s normal because byte arrays are the result of encoding a string; it doesn’t make sense and there’s no way to "encode" a byte array.
3

If you actually try your example you will find:

$ python
Python 3.7.2 (default, Jan 29 2019, 13:41:02) 
[Clang 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> payload = '''
...  工作报告 
...  总体情况:良好 
... '''
>>> r = requests.post("http://127.0.0.1:8888/post", data=payload)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/lib/python3.7/site-packages/requests/api.py", line 116, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/tmp/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/tmp/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/tmp/venv/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/tmp/venv/lib/python3.7/http/client.py", line 1274, in _send_request
    body = _encode(body, 'body')
  File "/tmp/venv/lib/python3.7/http/client.py", line 160, in _encode
    (name.title(), data[err.start:err.end], name)) from None
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 2-5: Body ('工作报告') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

As described in Detecting the character encoding of an HTTP POST request the default encoding for HTTP POST is ISO-8859-1 aka Latin-1. And as the error message right at the end of the traceback tells you, you can force it by encoding to an UTF-8 bytes string; but then of course your server needs to be expecting UTF-8, too; or you will simply be sending useless Latin-1 mojibake.

There is no way in the POST interface itself to enforce this, but your server could in fact require clients to explicitly specify their content encoding by using the charset parameter; maybe return a specific 5xx error code with an explicit error message if it's missing.

Somewhat less disciplinedly, you could have your server attempt to decode incoming POST requests as UTF-8, and reject the POST if that fails.

Comments

0

Requests uses* the standard library's http.client.HTTPConnection.request to send requests. This method will encode str data as latin-1 but will not encode bytes.

If you provide encoded input you should add a content-type header specifying the encoding used; conversely, if you provide a content-type header you should ensure that the encoding of the body matches that specified.

From the docs for HTTPConnection.request:

If body is specified, the specified data is sent after the headers are finished. It may be a str, a bytes-like object, an open file object, or an iterable of bytes. If body is a string, it is encoded as ISO-8859-1, the default for HTTP. If it is a bytes-like object, the bytes are sent as is. If it is a file object, the contents of the file is sent; this file object should support at least the read() method. If the file object is an instance of io.TextIOBase, the data returned by the read() method will be encoded as ISO-8859-1, otherwise the data returned by read() is sent as is. If body is an iterable, the elements of the iterable are sent as is until the iterable is exhausted.

* httplib was renamed to http.client in Python3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.