0

I'm writing a Python script that reads tweets and inserts them into MySQL. Depending on the attributes of each tweet, I need to insert different fields. For that reason, I'm building the fields and values section of the query string as I go, using Python string formatting for convenience:

values = """%s, %s, '%s','%s','%s','%s',%s,'%s','%s','%s'""" % (
                url_id, tweet['from_user_id'], conn.escape_string(tweet['location']),
                conn.escape_string(tweet['profile_image_url']),
                tweet['created_at'], tweet['from_user'], tweet['id'],
                conn.escape_string(tweet['text']),
                conn.escape_string(tweet['iso_language_code']), conn.escape_string(tweet['source'])
            )

When I do this with tweets that have UTF8 characters, though, I get an error like this:

values = """%s, %s, '%s','%s','%s','%s',%s,'%s','%s','%s'""" % (
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 117: ordinal not in range(128)

I think that the format string (the one with all the "%s"s) is interpreted as ASCII by default, and that's clashing with the UTF-8 characters. I need to keep everything in UTF-8, since this code has to work with any possible language.

So how do I specify that the formatting string is UTF-8? I thought I could change the default encoding for the entire script, but I'm using Python 2.4 and sys.setdefaultencoding doesn't exist in that version. Right now, I'm just not sure how to do that, or if that's even the right thing to do.

4
  • Yeah... that's a unicode, not UTF-8. Commented Jun 15, 2011 at 3:12
  • I hope python3 become status quo soon... Unicode FTW. BTW py2.4 is too damn old Commented Jun 15, 2011 at 3:21
  • Amen to that ... I'm not on 2.4 by choice. All the servers I work on are still stuck on it. Commented Jun 15, 2011 at 3:23
  • I also share this problem sometimes... CentOS and RHEL are stuck on 2.4.3... But it's easier to install 3.2 (as python3) than to update to 2.5+ Commented Jun 15, 2011 at 3:29

1 Answer 1

3

Change:

"""%s, %s, '%s','%s','%s','%s',%s,'%s','%s','%s'"""

to:

u"""%s, %s, '%s','%s','%s','%s',%s,'%s','%s','%s'"""

And then if you want to encode it to UTF-8, do:

value.encode('utf8')

But it looks like you're using the wrong approach anyway, see Escape string Python for MySQL

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! Actually, I tried that earlier, and I still get the same error. It still tries to encode everything in ASCII.
No -- good point. Now it's values = u"""%s, %s, '%s','%s','%s','%s',%s,'%s','%s','%s'""" % ( UnicodeEncodeError: 'ascii' codec can't encode characters in position 81-82: ordinal not in range(128)
That worked! Thank you so much. I knew I was doing something basic that was wrong.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.