I have the following string: u'\xe4\xe7\xec\xf7 \xe4\xf9\xec\xe9\xf9\xe9' encoded in windows-1255 and I want to decode it into Unicode code points (u'\u05d4\u05d7\u05dc\u05e7 \u05d4\u05e9\u05dc\u05d9\u05e9\u05d9').
>>> u'\xe4\xe7\xec\xf7 \xe4\xf9\xec\xe9\xf9\xe9'.decode('windows-1255')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\cp1255.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
However, if I try to decode the string: '\xe4\xe7\xec\xf7 \xe4\xf9\xec\xe9\xf9\xe9' I don't get the exception:
>>> '\xe4\xe7\xec\xf7 \xe4\xf9\xec\xe9\xf9\xe9'.decode('windows-1255')
u'\u05d4\u05d7\u05dc\u05e7 \u05d4\u05e9\u05dc\u05d9\u05e9\u05d9'
How do I decode the Unicode hex string (the one that gets the exception) or convert it to a regular string that can be decoded?
Thanks for the help.