Convert unicode string to byte string

Question

I get a string from a function that is represented like u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0', but to process it I need it to be bytestring (like '\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0').

How do I convert it without changes?

My best guess so far is to take s.encode('unicode_escape'), which will return '\\xd0\\xbc\\xd0\\xb0\\xd1\\x80\\xd0\\xba\\xd0\\xb0' and process every 5 characters so that '\xd0' becomes one character represented as '\xd0'.

Ignacio Vazquez-Abrams · Accepted Answer · 2012-06-24 03:46:26Z

23

ISO 8859-1 (aka Latin-1) maps the first 256 Unicode codepoints to their byte values.

>>> u'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'.encode('latin-1')
'\xd0\xbc\xd0\xb0\xd1\x80\xd0\xba\xd0\xb0'

answered Jun 24, 2012 at 3:46

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

JBernardo Over a year ago

That's smart. My first option would be bytes(map(ord, x)) but it may be much slower...

bryce Over a year ago

confirmed this produces the desired result

zwol Over a year ago

@JBernardo That only works in python 3; it's not clear from the text, but the odds are the OP is on python 2.

JBernardo Over a year ago

@Zack maybe you could use str(bytearray(...)) instead of bytes. Or even worse: ''.join(map(chr, ...))

Nathan G Over a year ago

throws 'ordinal not in range'

Collectives™ on Stack Overflow

Convert unicode string to byte string

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related