0

BLUF: Why is the decode() method on a bytes object failing to decode ç?

I am receiving a UnicodeDecodeError: 'utf-8' codec can't decode by 0xe7 in position..... Upon tracking down the character, it is the ç character. So when I get to reading the response from the server:

conn = http.client.HTTPConnection(host = 'something.com')
conn.request('GET', url = '/some/json')
resp = conn.getresponse()
content = resp.read().decode() # throws error

I am unable to get the content. If I just do content = resp.read() it is successful, I can write to file using wb but then whever the ç is, it is replaced with 0xE7 in the file upon writing. Even if I open the file in Notepad++ and set the encoding to UTF-8, the character only shows as the hex version.

Why am I not able to decode this UTF-8 character from an HTTPResponse? Am I not correctly writing it to file either?

15
  • Have you considered using requests? Commented Nov 6, 2017 at 18:08
  • @kichik No need. requests is just a high level API for making the same type of requests. It relies on http.client to make the socket connections anyhow. The example I have shown is somewhat false, as I am really making HTTPS connections and requests does not support SSL. Commented Nov 6, 2017 at 18:11
  • @kichik Further, the real question is why does decode() not work on a valid UTF-8 character? Commented Nov 6, 2017 at 18:12
  • The server doesn't seem to send you actual UTF-8. I was hoping requests will do better at detecting that. The actual UTF-8 representation for ç is b'\xc3\xa7'. The server is sending you CP1252. Commented Nov 6, 2017 at 18:18
  • 1
    What does resp.getheaders() return? Commented Nov 6, 2017 at 18:26

1 Answer 1

1

When you have issues with encoding/decoding, you should take a look at the UTF-8 Encoding Debugging Chart.

If you look in the chart for the Windows 1252 code point 0xE7 you find the expected character is ç showing that the encoding is CP1252.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.