1

I have a MySQL database which is as far as I can tell utf8 through and through. It's a database originally used solely by a Ruby on Rails application which has no issue writing and reading the utf8 characters.

However, when attempting to read via a c# app using Dapper I'm seeing a lot of bad characters, for example:

Expected: FELIZ AÑO Actual: FELIZ AÑO

My connection string looks like this:

Server=;Database=;Uid=;Pwd=;Port=;SslMode=;charset=utf8;

I've tried several combinations of charset and utf8 capitalisation but none have worked so far. I read in one thread that the charset in the connection string only affects the SQL query language so if that's correct that'll be why its not helping!

Is there anything else I'm missing potentially? or anything else I can do to be able to read the characters correctly?

Update: Some info from MySQL:

mysql> SELECT TABLE_NAME, COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME, COLUMN_TYPE
    -> FROM information_schema.columns
    -> WHERE TABLE_NAME = 'pages'
    -> AND COLUMN_NAME = 'title';

| TABLE_NAME | COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME  | COLUMN_TYPE  |
| pages      | title       | utf8               | utf8_general_ci | varchar(255) |

Update 2: More info, it seems my character sets are a bit messed up..

mysql> show variables like "character_set_%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

mysql> select collation_name from information_schema.columns where table_name = 'pages' and column_name = 'title';
+-----------------+
| collation_name  |
+-----------------+
| utf8_general_ci |
+-----------------+

mysql> select title from pages where id = 3660;
+--------------------------------+
| title                          |
+--------------------------------+
| FELIZ AÑO  |
+--------------------------------+

mysql> set character set 'latin1';

mysql> select title from pages where id = 3660;
+-----------------------------+
| title                       |
+-----------------------------+
| FELIZ AÑO  |
+-----------------------------+

So, database is latin1, default connection is utf8, default client is utf8, column is utf8.

If I query the column directly in mysql, it comes out wrong until I set the character set to latin1.

Doing the exact same thing in C# with Execute to set the character set and ExecuteReader to read the column is still producing the bad characters. Trying to work out the difference.

Update 3 - Screenshot for @BradleyGrainger, this is from SequelPro

enter image description here

9
  • 2
    Dapper just asks ADO.NET for the value from the data-reader, which is MySQL-specific - it doesn't do any of the interpreting, so: if the specific provider's data-reader is unable to interpret the data, that will impact any tool that sits on top of ADO.NET Commented Feb 12, 2019 at 15:42
  • 2
    aye; I think while you'e trying to fix the configuration, you may want to just test with a basic DbCommand that uses ExecuteScalar - until that gets the value right, all bets are off; sadly, I don't know a lot about MySQL configuration (either at the column level, or in the connection-string), so I can't advise directly Commented Feb 12, 2019 at 16:03
  • 1
    Can you paste a screenshot from MySQL Workbench that shows that the values in the table are encoded correctly? It's possible that your database actually contains FELIZ AÑO (encoded in UTF-8) and Ruby on Rails is double decoding it to get the desired string. Commented Feb 12, 2019 at 18:42
  • 1
    You are suffering from Mojibake, or possibly "double encoding". Commented Feb 13, 2019 at 5:28
  • 1
    Please provide `select HEX(title) from pages where id = 3660; Commented Feb 13, 2019 at 5:29

1 Answer 1

2

OK, it's just Mojibake.

HEX:  46 45 4C 49 5A 20 41 C383 E28098 4F
      F  E  L  I  Z (sp) A  Ã      ‘   O
Mojibaked:  FELIZ AÑO
Should be:  FELIZ AÑO

The instructions for preventing it is in "Mojibake" of Trouble with UTF-8 characters; what I see is not what I stored

After preventing it, one way to "fix" the data would involve an UPDATE to change the column contents thus:

col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4)

You can see that via:

SELECT CONVERT(BINARY(CONVERT('FELIZ AÑO' USING latin1)) USING utf8mb4); -- FELIZ AÑO

(Please experiment in a test environment, not production.)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.