11

I get a string from a function that is represented like u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0', but to process it I need it to be bytestring (like '\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0').

How do I convert it without changes?

My best guess so far is to take s.encode('unicode_escape'), which will return '\\xd0\\xbc\\xd0\\xb0\\xd1\\x80\\xd0\\xba\\xd0\\xb0' and process every 5 characters so that '\xd0' becomes one character represented as '\xd0'.

1 Answer 1

23

ISO 8859-1 (aka Latin-1) maps the first 256 Unicode codepoints to their byte values.

>>> u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'.encode('latin-1')
'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'
Sign up to request clarification or add additional context in comments.

5 Comments

That's smart. My first option would be bytes(map(ord, x)) but it may be much slower...
confirmed this produces the desired result
@JBernardo That only works in python 3; it's not clear from the text, but the odds are the OP is on python 2.
@Zack maybe you could use str(bytearray(...)) instead of bytes. Or even worse: ''.join(map(chr, ...))
throws 'ordinal not in range'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.