Python unicode error. UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a'

Question

So, I have this code to fetch JSON string from url

url = 'http://....'
response = urllib2.urlopen(rul)
string = response.read()
data = json.loads(string)

for x in data: 
    print x['foo']

The problem is x['foo'], if tried to print it as seen above, I get this error.

Warning: Incorrect string value: '\xE4\xB8\xBA Co...' for column 'description' at row 1

If I use x['foo'].decode("utf-8") I get this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a' in position 0: ordinal not in range(128)

If I try, encode('ascii', 'ignore').decode('ascii') Then I get this error.

x['foo'].encode('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType' object has no attribute 'encode'

Is there any way to fix this problem?

Read bit.ly/unipain

Daenyth
– Daenyth

2015-10-07 12:46:50 +00:00
Commented Oct 7, 2015 at 12:46 — Daenyth
– Daenyth, Commented Oct 7, 2015 at 12:46

Daenyth · Accepted Answer · 2015-10-07 13:12:00Z

2

x['foo'].decode("utf-8") resulting in UnicodeEncodeError means that x['foo'] is of type unicode. str.decode takes a str type and translates it to unicode type. Python 2 is trying to be helpful here and attempts to implicitly convert your unicode to str so that you can call decode on it. It does this with sys.defaultencoding, which is ascii, which can't encode all of Unicode, hence the exception.

The solution here is to remove the decode call - the value is already unicode.

Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future.

It's worth noting here that everything returned by json.load will be unicode and not str.

Addressing the new question after edits:

When you print, you need bytes - unicode is an abstract concept. You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode object to str. You can do this be calling encode with an encoding that tells it how to translate from the abstract string into concrete bytes. Generally you want to use the utf-8 encoding.

This should work:

print x['foo'].encode('utf-8')

edited Oct 7, 2015 at 13:12

answered Oct 7, 2015 at 12:49

Daenyth

37.8k15 gold badges92 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

arbi-g11324115 Over a year ago

Thanks, for the link. I'll take a look at it, but right now, it would be nice to have an answer as I am sure, I am getting errors even while removing decode. All the other lists in the array are fine, but sometimes the x['foo'] contains emojis asiic characters, and that is causing the issue

Daenyth Over a year ago

This answers the question you posted; if you're getting other errors in your app it's because you're still making the same mistake (mixing up unicode and str types). Reading the link presents some very in-depth explanation and guidance on preventing it from happening at all

arbi-g11324115 Over a year ago

I'm sorry. I thought that would do it, but I am getting Warning: Incorrect string value: '\xE4\xB8\xBA Co...' for column 'foo' at row 1 using as you suggested x['foo'].encode('utf-8'). Do I really need to import some library for the encoding to work?

Daenyth Over a year ago

@arbi-g11324115 Is print given you that error or something else? I'd open a new question with that specifically, because it sounds like you're using some database library that isn't handling unicode well. Off the top of my head, because you mentioned emoji, might you be running on a somewhat old version of mysql? Some emoji are new to the unicode standard, and older versions of mysql don't support it yet

arbi-g11324115 Over a year ago

Maybe that's it. I really can't tell. I am using mysql 5.4 Here is how I am setting the unicode

conn = MySQLdb.connect("****","root","****","****") conn.set_character_set('utf8') cursor = conn.cursor() cursor.execute('SET NAMES utf8;') cursor.execute('SET CHARACTER SET utf8;') cursor.execute('SET character_set_connection=utf8;')

|

Collectives™ on Stack Overflow

Python unicode error. UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a'

1 Answer 1

12 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related