2

I've got a node.js script running that pulls data out of a public "database" (it's a 'blockchain') and then performs some operations on it and then inserts it into a MySQL database. I've got the MySQL database using UTF8_general_ci encoding. The vast majority of data parses fine, but every so often it hits something it can't insert. I get this error:

code: 'ER_TRUNCATED_WRONG_VALUE_FOR_FIELD',
  errno: 1366,
  sqlMessage: 'Incorrect string value: \'\\xF0\\x9F\\x8D\\x95 N...\' for column \'body\' at row 1',
  sqlState: 'HY000',
  index: 0,

It shows the bit of string that seems to be causing the error, and it always has this distinct question mark character:

This looks like Paradise for me! ����\

I'm guessing this is an encoding issue? Is there a way I can convert these before it throws an error? I'm not sure what encoding this blockchain uses, and I'm not even sure how I'd find out.

edit: here's what another example (of the error) shows on the web interface to this blockchain:

And your very welcome !👍👍👍💕💕💕

another edit: I should point out that I am using mysql.format(sql, inserts) to handle inadvertent sql problems with the data -https://github.com/mysqljs/mysql#preparing-queries

5
  • Is there error on the MySQL side or the Node side? Commented Oct 16, 2017 at 6:24
  • It shows in the node console. There's a bit more error info that I'll edit in just now. Commented Oct 16, 2017 at 6:28
  • 2
    Try utf8mb4 (mysql's actual implementation of UTF8) Commented Oct 16, 2017 at 6:54
  • I just tried utf8mb4_unicode_ci and it's throwing the same errors. Commented Oct 16, 2017 at 7:16
  • I've tried all the non-language specific (and BIN) utf8mb4 encodings in the database now, and none of them avoid that error. Commented Oct 16, 2017 at 7:34

2 Answers 2

12

The likely answer is that the MySQL connection did not specify utf8mb4. (Specifically, MySQL's utf8 will not suffice for Emoji.) Can you provide the connection code? Here is a fallback: Execute this after connecting to MySQL:

SET NAMES utf8mb4

Another possible answer is that the web server is not treating the page as UTF-8. One way to solve this is with this in the <head>:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> 

And the column needs to be CHARACTER SET utf8mb4, not ut8.

node.js connection pooling

var connection = mysql.createConnection({ ... , charset : 'utf8mb4'});

See stackoverflow

Sign up to request clarification or add additional context in comments.

4 Comments

The connection is based on the examples in the mysql nodejs docs: var mysql = require('mysql'); var pool = mysql.createPool({ connectionLimit : 10, host : 'example.org', user : 'bob', password : 'secret', database : 'my_db' }); I'm not displaying the characters, so I don't need to worry about things on the webserver side.
But you do have to worry about the "character set" -- The error is complaining that you are trying to stuff a 4-byte Emoji into a CHARACTER SET utf8 hole in MySQL. That is limited to the 3-byte subset of UTF-8.
Thanks -- "node.js" was the clue I needed. See what I added.
I've made the change to the connection charset and am just running some data through now. No errors so far!
1

I ended up solving this by using iconv-lite from npm.

So basically:

const iconv = require("iconv-lite");
var buf = iconv.encode("weird characters", "utf8");

6 Comments

This is making things worse. To demonstrate, let's see SELECT HEX(...) FROM .... If you don't see 4-byte chunks beginning with F0, it is a mess. The character you first mentioned should show F09F8D95. If you get C3B0C5B8C28DE280A2, you have "double encoding". 👍💕 is hex F09F918D F09F9295
It's literally stopped the error, so I don't see how it can be making things worse.
Another possibility is that it converted to 🌪 U+1F32A or \u1F32A type "Unicode" encoding. That is "worse" for anything other than a web page. Let's see the hex, so I am not guessing.
The data looks fine in the database tables. It's not for displaying, only searching.
Searching is even worse -- You must have either F09F918D in both the search string and the data or \u1F32A in both. Else, the tornado will not match.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.