Does anyone know how to setup a MySQL table column to hold the utf8mb4 charset with sqlalchemy or specifically with flask_sqlalchemy? The connection is described here https://docs.sqlalchemy.org/en/14/dialects/mysql.html but I cant find out how to change the table properties to make the flask migrations create the table properly.
The problem I'm seeing is that some email content (which I'm trying to save) can have 4 byte Unicode and will fail to insert to the table with this error
(pymysql.err.DataError) (1366, "Incorrect string value: '\\xEF\\xBB\\xBF\\x0D\\x0A>...' for column `test`.`email`.`content_text` at row 1")
which corresponds to this string snippet in email
>=20
> =EF=BB=BF
>=20
and can be replicated as a test like this which if sent to db will recreate the error:
print(f'bad chr >> {chr(65279)} <<')
to enable the table to store this char you can use alter which is not great because its post declaration.
ALTER TABLE email
DEFAULT CHARACTER SET utf8mb4,
MODIFY content_text MEDIUMTEXT
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
So I would like that in the declaration of the table but cannot figure out how this is done with the flask orm interface.
Here is a test script to recreate the problem (after a test db exists and test user has grant all on it to create the table)
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate
from sqlalchemy.dialects.mysql import MEDIUMTEXT
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'mysql+pymysql://test:test@localhost/test?charset=utf8mb4'
db = SQLAlchemy(app)
Migrate(app,db)
class Email(db.Model):
__tablename__ = 'email'
id = db.Column(db.Integer, primary_key=True)
content_text = db.Column(MEDIUMTEXT)
@app.cli.command("test_schema")
def test_schema():
for i, row in enumerate(db.session.execute('desc email'), 1):
[print(f'row {i} {row._fields[y]:<10s} : {str(col):<20s}') for y,col in enumerate(row)]
print()
@app.cli.command("test_problem")
def test_problem():
new_mail = Email( content_text = f'bad chr >> {chr(65279)} <<' )
db.session.add( new_mail )
db.session.commit()
@app.cli.command("drop")
def drop():
db.session.execute('drop table email')
calling it several times to create the schema then test the problem with the command line functions I added.
# creates the table provided test db is created and user has grant all set to that test db
FLASK_APP=test_schema.py flask db init
FLASK_APP=test_schema.py flask db migrate
FLASK_APP=test_schema.py flask db upgrade
# test the problem
FLASK_APP=test_schema.py flask test_problem
so I would like to be able to define content_text = db.Column(MEDIUMTEXT) without doing an alter db.
These are the database objects prior to any ALTER TABLE:
> SHOW CREATE DATABASE test;
CREATE DATABASE 'test' /*!40100 DEFAULT CHARACTER SET latin1 */
> SHOW CREATE TABLE email;
CREATE TABLE 'email' (
'id' int(11) NOT NULL AUTO_INCREMENT,
'content_text' mediumtext DEFAULT NULL,
PRIMARY KEY ('id')
) ENGINE=InnoDB DEFAULT CHARSET=latin1
ALTER TABLEafter the table is created? Would creating the database with charset utf8mb4 be an acceptable solution?content_text = db.Column(MEDIUMTEXT(unicode=True))not sure if it has any side effects.