1

I have this list:

l = [u'\xf9', u'!']

And I want to convert it in this list:

l2 = ['ù','!']

How can i do it? and Why does l.encode() not work?

6
  • 1
    Er, because encode is a method on strings, not on lists. Commented Apr 13, 2015 at 20:39
  • 2
    What you mean by convert? 'ù' is just a type of representation of your character! do you mean that you want to print it like that? Commented Apr 13, 2015 at 20:39
  • Sorry, I wanted to say l[0].encode() Commented Apr 13, 2015 at 20:40
  • 1
    [u.encode('u8') for u in l] l[0].encode wont work because the character is outside ascii range (128) Commented Apr 13, 2015 at 20:42
  • That's what i Did Shashank, but why 'ù' is converted to '\xc3\xb9'? This should have been my question... sorry. Commented Apr 13, 2015 at 20:44

1 Answer 1

1

Are you using Python 2 ? If it is the case, you might be fooled by the way Python displays strings.

As you noticed, '\xc3\xb9' is the UTF-8 encoded representation of code point U+00F9 ('ù'). So:

# code point
>>> u'ù'
u'\xf9'

# seems wrong ?
>>> u'ù'.encode('utf-8')
'\xc3\xb9'

# No, not at all (at least on my UTF-8 terminal)
>>> print(u'ù'.encode('utf-8'))
ù

Given your example:

>>> l = [u'\xf9', u'!']
>>> print(l)
[u'\xf9', u'!']
>>> l[0]
u'\xf9'
>>> print(l[0])
ù

>>> l2 = [u.encode('utf-8') for u in l]
>>> l2
['\xc3\xb9', '!']
>>> print(l2)
['\xc3\xb9', '!']
>>> print(l2[0])
ù

I agree all of this is rather inconsistent and source of frustration. That's why string/unicode support was a major rewrite in Python 3. There:

# Python 3
>>> l = [u'\xf9', u'!']
>>> l
['ù', '!']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.