0

i' ve a very simply environment:

In [64]: aa = '\xe1'

In [65]: aa
Out[65]: '\xe1'

In [66]: type(aa)
Out[66]: str

In [67]: u'\xe1'
Out[67]: u'\xe1'

In [68]: u'%s' % aa
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/django/core/management/commands/shell.pyc
in <module>()
----> 1 u'%s' % aa

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 0:
ordinal not in range(128)

All I'd like to do is to convert this aa string to unicode. How could i do this?

In the db i've unicode strings, and with str i can' t make a resulting django query if it contains special characters. Using .encode('utf-8') or unicode(aa) i got the same UnicodeDecodeError.

I also tried playing with sys.setdefaultencoding, and then it might work, however it kills everything else.

Python version: 2.7.3

1
  • Whenever I am messing with unicode in python, I tend to refer back to: farmdev.com/talks/unicode it is the best explanation of how to deal with unicode in python 2.7 I have seen to date. Good Luck Commented Apr 7, 2014 at 9:39

2 Answers 2

2

If you concatenate strings with different encodings, python uses ascii as default to decode the string. Which obviously is the wrong encoding in your case.

Your aa is a unicode á:

>>> print u'\xe1'
á

If you pass aa as unicode, it will work

>>> aa = u'\xe1'
>>> u'%s' % aa
u'\xe1'

>>> print u'%s' % aa
á

You cannot tread the '\xe1' as utf-8 because it's not valid in utf-8, therefore you cannot decode it.

>>> '\xe1'.decode('utf-8')
...

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data

Important part in the traceback is: unexpected end of data.

Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

aa.decode('latin-1')

or

unicode('\xe1', 'latin-1')

'\xe1' (225 as an int) is not part of ascii, so, in order to transform your string into a unicode instance, you have to specify what encoding is used in the original string.

My examples assume your original string is in latin-1. Maybe you're using another encoding and you'll have to find out what it is.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.