1

I get the following warnings, when trying to save a simple dataframe to mysql.:

C:...\anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning: (1366, "Incorrect string value: '\x92\xE9t\xE9)' for column 'VARIABLE_VALUE' at row 518") result = self._query(query)

And

C:...anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning: (3719, "'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.") result = self._query(query)

Environment info : I use Mysql8, python3.6 (pymysql 0.9.2, sqlalchemy 1.2.1)

I visited posts like the one linked bellow, none of which seem to give a solution as to how to avoid this warning.

N.B : The Collation in the table within mysql doesn't seem to be set to the one I specified in the create_db function within the Connection class.

The executable code:

import DataEngine.db.Connection as connection
import random
import pandas as pd

if __name__ == "__main__":
    conn = connection.Connection(host="host_name", port="3306", user="username", password="password")
    conn.create_db("raw_data")
    conn.establish("raw_data")
    l1 = []
    for i in range(10):
        l_nested = []
        for j in range(10):
            l_nested.append(random.randint(0, 100))
        l1.append(l_nested)
    df = pd.DataFrame(l1)

    conn.save(df, "random_df")
    df2 = conn.retrieve("random_df")
    print(df2)

So the dataframe that is persisted in the database is :

   index   0   1   2   3   4   5   6   7   8   9
0      0  11  57  75  45  81  70  91  66  93  96
1      1  51  43   3  64   2   6  93   5  49  40
2      2  35  80  76  11  23  87  19  32  13  98
3      3  82  10  69  40  34  66  42  24  82  59
4      4  49  74  39  61  14  63  94  92  82  85
5      5  50  47  90  75  48  77  17  43   5  29
6      6  70  40  78  60  29  48  52  48  39  36
7      7  21  87  41  53  95   3  31  67  50  30
8      8  72  79  73  82  20  15  51  14  38  42
9      9  68  71  11  17  48  68  17  42  83  95

My Connection class

import sqlalchemy
import pymysql
import pandas as pd


class Connection:
    def __init__(self: object, host: str, port: str, user: str, password: str):
        self.host = host
        self.port = port
        self.user = user
        self.password = password
        self.conn = None

    def create_db(self: object, db_name: str, charset: str = "utf8mb4", collate:str ="utf8mb4_unicode_ci",drop_if_exists: bool = True):
        c = pymysql.connect(host=self.host, user=self.user, password=self.password)
        if drop_if_exists:
            c.cursor().execute("DROP DATABASE IF EXISTS " + db_name)
        c.cursor().execute("CREATE DATABASE " + db_name + " CHARACTER SET=" + charset + " COLLATE=" + collate)
        c.close()
        print("Database %s created with a %s charset" % (db_name, charset))

    def establish(self: object, db_name: str, charset: str = "utf8mb4"):
        self.conn = sqlalchemy.create_engine(
            "mysql+pymysql://" + self.user + ":" + self.password + "@" + self.host + ":" + self.port + "/" + db_name +
            "?charset=" + charset)
        print("Connection with database : %s has been established as %s at %s." % (db_name, self.user, self.host))
        print("Charset : %s" % charset)

    def retrieve(self, table):
        df = pd.read_sql_table(table, self.conn)
        return df

    def save(self: object, df: "Pandas.DataFrame", table: str, if_exists: str = "replace", chunksize: int = 10000):
        df.to_sql(name=table, con=self.conn, if_exists=if_exists, chunksize=chunksize)

Some elements that might help: Database is of charset utf8mb4 Table columns don't seem to have the correct collation even though I specified it when creating the db

1 Answer 1

1

Well, hex 92 and e9 is not valid utf8mb4 (UTF-8). Perhaps you were expecting ’été, assuming CHARACTER SETs cp1250, cp1256, cp1257, or latin1.

Find out where that text is coming from, and let's decide whether it is valid latin1. Then we can fix the code to declare that the client is really using latin1, not utf8mb4? Or we can fix the client to use UTF-8, which would probably be better in the long run.

Sign up to request clarification or add additional context in comments.

7 Comments

But there is no éé, they're just numbers as shown in the main.
@Aetos - Is it in French, and talking about "summer"?
There are no characters at all in the dataset. If the data isn't in the dataset where else would it come from?
@Aetos - "column 'VARIABLE_VALUE' at row 518" -- hmmmm... Was this a fresh install of MySQL? Or an Upgrade? Please provide SHOW VARIABLES; and SHOW GLOBAL STATUS; I'm wondering what is at line 518 of those.
This was a fresh install. I did a mysql -u"UserName" -p -h "HOST" -A -e"SHOW GLOBAL STATUS;" > MySQLCurrentStatus.txt, I got an output with 429 lines. As for SHOW GLOBAL STATUS; The line 518 contains sync_relay_log_info 10000.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.