0

So, I have this code to fetch JSON string from url

url = 'http://....'
response = urllib2.urlopen(rul)
string = response.read()
data = json.loads(string)

for x in data: 
    print x['foo']

The problem is x['foo'], if tried to print it as seen above, I get this error.

Warning: Incorrect string value: '\xE4\xB8\xBA Co...' for column 'description' at row 1

If I use x['foo'].decode("utf-8") I get this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u4e3a' in position 0: ordinal not in range(128)

If I try, encode('ascii', 'ignore').decode('ascii') Then I get this error.

x['foo'].encode('ascii', 'ignore').decode('ascii') AttributeError: 'NoneType' object has no attribute 'encode'

Is there any way to fix this problem?

1

1 Answer 1

2

x['foo'].decode("utf-8") resulting in UnicodeEncodeError means that x['foo'] is of type unicode. str.decode takes a str type and translates it to unicode type. Python 2 is trying to be helpful here and attempts to implicitly convert your unicode to str so that you can call decode on it. It does this with sys.defaultencoding, which is ascii, which can't encode all of Unicode, hence the exception.

The solution here is to remove the decode call - the value is already unicode.

Read Ned Batchelder's presentation - Pragmatic Unicode - it will greatly enhance your understanding of this and help prevent similar errors in the future.

It's worth noting here that everything returned by json.load will be unicode and not str.


Addressing the new question after edits:

When you print, you need bytes - unicode is an abstract concept. You need a mapping from the abstract unicode string into bytes - in python terms, you must convert your unicode object to str. You can do this be calling encode with an encoding that tells it how to translate from the abstract string into concrete bytes. Generally you want to use the utf-8 encoding.

This should work:

print x['foo'].encode('utf-8')
Sign up to request clarification or add additional context in comments.

12 Comments

Thanks, for the link. I'll take a look at it, but right now, it would be nice to have an answer as I am sure, I am getting errors even while removing decode. All the other lists in the array are fine, but sometimes the x['foo'] contains emojis asiic characters, and that is causing the issue
This answers the question you posted; if you're getting other errors in your app it's because you're still making the same mistake (mixing up unicode and str types). Reading the link presents some very in-depth explanation and guidance on preventing it from happening at all
I'm sorry. I thought that would do it, but I am getting Warning: Incorrect string value: '\xE4\xB8\xBA Co...' for column 'foo' at row 1 using as you suggested x['foo'].encode('utf-8'). Do I really need to import some library for the encoding to work?
@arbi-g11324115 Is print given you that error or something else? I'd open a new question with that specifically, because it sounds like you're using some database library that isn't handling unicode well. Off the top of my head, because you mentioned emoji, might you be running on a somewhat old version of mysql? Some emoji are new to the unicode standard, and older versions of mysql don't support it yet
Maybe that's it. I really can't tell. I am using mysql 5.4 Here is how I am setting the unicode conn = MySQLdb.connect("****","root","****","****") conn.set_character_set('utf8') cursor = conn.cursor() cursor.execute('SET NAMES utf8;') cursor.execute('SET CHARACTER SET utf8;') cursor.execute('SET character_set_connection=utf8;')
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.