1

I have

(Pdb) email
'\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'
(Pdb) print email
[email protected]

I need to validate whether thie value is an email format, however, how can i convert this string to actual ascii string?

2 Answers 2

2

Seems like it's encoded with utf-16 encoding.

>>> '\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'.decode('utf-16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x00 in position 28: truncated data

and truncated:

>>> '\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'[1:].decode('utf-16')
u'[email protected]'

>>> '\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'[1:].decode('utf-16-le')
u'[email protected]'
>>> '\x00t\x00e\x00s\x00t\x00@\x00g\x00m\x00a\x00i\x00l\x00.\x00c\x00o\x00m\x00'.decode('utf-16-be', 'ignore')
u'[email protected]'
Sign up to request clarification or add additional context in comments.

2 Comments

Truncated in one or the other direction; little endian or big endian.
If I had to guess about the truncation, I'd blame something splitting on whitespace. The UTF-16 encoding of the space character will be '\x00 ' or ' \x00' depending on byte order, which a straight split will mangle. Or any other ascii whitespace, of course. You can't safely split an encoded bytestring, especially not UTF-16. strip has similar potential to mangle things.
0

Converting your email to an ASCII string can be done like this :

str(email.decode('utf-16le'))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.