How to convert byte string to character with correct escaping?

Question

I can not figure out, why the decoding fails, if the byte string starts with hex a, b, c, d, e or f, instead of a number, there are always two backslashs instead of one.

>>> bstr = b'\xfb'
>>> bstr.decode('utf8', 'backslashreplace')
'\\xfb'

What I want is '\xfb' instead.

but,

>>> bstr = b'\x1f'
>>> bstr.decode('utf8', 'backslashreplace')
'\x1f'

works as expected. Do you know what is wrong?

b'\xfb' is not the UTF-8 encoding of '\xfb'. Decoding that bytestring in UTF-8 should not result in '\xfb'. You're handling encoding fundamentally wrong. — user2357112
– user2357112, Commented Feb 19, 2019 at 22:01
May I ask, how to decode the bytestring to get the expected result? — john s.
– john s., Commented Feb 19, 2019 at 22:04
Your expectations are wrong. You can do a thing that will get the result you were expecting, but doing that thing is most likely wrong. You need to figure out what you should be doing, even if it turns out that the thing you should be doing doesn't produce the results you currently expect. — user2357112
– user2357112, Commented Feb 19, 2019 at 22:06

user2357112 · Accepted Answer · 2019-02-19 22:11:06Z

b'\xfb' is a bytestring containing a single byte. That byte has hex value FB, or 251 in decimal.

'\xfb' is a string containing a single Unicode code point. That code point is U+00FB LATIN SMALL LETTER U WITH CIRCUMFLEX, or û.

b'\xfb' is not the UTF-8 encoding of '\xfb'. The UTF-8 encoding of '\xfb' is b'\xc3\xbb':

>>> '\xfb'.encode('utf-8')
b'\xc3\xbb'

In fact, b'\xfb' is not the UTF-8 encoding of anything at all, and trying to decode it as UTF-8 is an error. 'backslashreplace' specifies a way of handling that error, where the FB byte is replaced with the character sequence backslash-x-f-b.

While it is possible to do a thing that will convert b'\xfb' to '\xfb', that conversion has nothing to do with UTF-8, and applying that conversion without getting your requirements straight will only cause more problems. You need to figure out what your program actually needs to be doing. Most likely, the right path forward doesn't involve any b'\xfb' to '\xfb' conversion. We can't tell what you need to do, since we're missing so much context.

Collectives™ on Stack Overflow

How to convert byte string to character with correct escaping?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related