Unicode string in python

Question

I have

(Pdb) email
'\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'
(Pdb) print email
[email protected]

I need to validate whether thie value is an email format, however, how can i convert this string to actual ascii string?

falsetru · Accepted Answer · 2013-11-09 10:00:35Z

2

Seems like it's encoded with utf-16 encoding.

>>> '\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'.decode('utf-16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x00 in position 28: truncated data

and truncated:

>>> '\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'[1:].decode('utf-16')
u'[email protected]'

>>> '\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'[1:].decode('utf-16-le')
u'[email protected]'
>>> '\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'.decode('utf-16-be', 'ignore')
u'[email protected]'

edited Nov 9, 2013 at 10:00

answered Nov 9, 2013 at 9:24

falsetru

371k69 gold badges769 silver badges659 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Martijn Pieters Over a year ago

Truncated in one or the other direction; little endian or big endian.

Peter DeGlopper Over a year ago

If I had to guess about the truncation, I'd blame something splitting on whitespace. The UTF-16 encoding of the space character will be '\x00 ' or ' \x00' depending on byte order, which a straight split will mangle. Or any other ascii whitespace, of course. You can't safely split an encoded bytestring, especially not UTF-16. strip has similar potential to mangle things.

Sssssuppp · Accepted Answer · 2019-02-27 13:50:07Z

0

Converting your email to an ASCII string can be done like this :

str(email.decode('utf-16le'))

edited Feb 27, 2019 at 13:50

Sssssuppp

7112 gold badges9 silver badges31 bronze badges

answered Feb 27, 2019 at 13:13

Asya Olshansky

2310 bronze badges

Collectives™ on Stack Overflow

Unicode string in python

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related