0

I am using Beautifulsoup to scrape Chinese text from a Chinese website, and I tried to insert the string I scraped into mysql database through MySQLdb in python. But I encountered UnicodeEncodeError when I execute the query. The code is as the following:

movie_name_fail = my_beautifulsoup_object.find("div").text
my_cursor.execute("INSERT INTO MOVIE_TABLE VALUES(%s)",movie_name_fail)

It gives me the error:

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-7: ordinal not in range(256)

But when I do

print movie_name_fail

The Chinese characters can be printed out corrrectly. And I have already declared

#!/usr/bin/python
# -*- coding: utf-8 -*-

as the encoding of my python source file, but it did not work. However, when I typed the same Chinese characters directly into my text editor(I am using sublime text), it worked pretty well and I am able to insert it into mysql and display it in mysql console correctly(I have already set the CHARACTER SET of the table in mysql to be utf8):

movie_name_success = "超人总动员"
my_cursor.execute("INSERT INTO MOVIE_TABLE VALUES(%s)",movie_name_success)

I could not figure out why the bug happened and how it worked. I would really appreciate any help.

Update

My python version is 2.7.8, and the MYSQL version is 5.7.11

I pushed my source code to github which should be able to reproduce the error on line 117: "db_cursor.executemany(insert_sql,movie_tuple_list) "

https://github.com/shawnli2010/JHSaver/blob/master/LeTV_scraper.py

5
  • 1
    This answer suggests decoding your page first, before passing into beautiful soup (eg. html = html.decode('utf-8')) Commented Apr 14, 2016 at 23:07
  • I tried it. It gave me another error:"UnicodeEncodeError: 'ascii' codec can't encode characters in position 531-533: ordinal not in range(128)" Commented Apr 15, 2016 at 0:22
  • Please update with Python version and small, complete example that actually reproduces the error. Commented Apr 15, 2016 at 1:44
  • @MarkTolonen I just updated with the information you asked for, thanks for helping me! Commented Apr 15, 2016 at 2:13
  • Please provide SHOW CREATE TABLE. Commented Apr 15, 2016 at 20:12

1 Answer 1

1

Does that Python construct add quotes when doing the substitution? It needs to.

Did you establish utf8mb4 for the connection?

Is the table/column CHARACTER SET utf8mb4?

More Python notes

I suggest utf8mb4 instead of utf8 because Chinese has some characters that need 4 bytes.

Sign up to request clarification or add additional context in comments.

1 Comment

I add "charset" and "use_unicode" arguments for "db = MySQLdb.connect()", based on the python notes you provide,and it worked! Thank you so much!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.