2

I am using Perl 5.20.2 and MySQL 5.5.57 on a Debian 8 machine. I recently discovered that MySQL's utf8 tables are limited to three-byte-characeters. As a consequence I can not store emojis. So, I tried utfmb4 tables which are supposed to address the issue. I changed the table from utf8 to utf8mb4 from inside the mysql client:

ALTER DATABASE `mydb` CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE `mydb`.`mytable` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE `mydb`.`mytable` CHANGE `object` `object` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Storing data in mytable seems to work, at least I can see the expected emoji in phpMyAdmin. However, when reading from the table I receive a 4-character result with 3 unprintable charaters. The following program is supposed to print the same emoji twice:

#!/usr/bin/perl

use 5.10.1;
use warnings;
use strict;
use DBI;

binmode(STDOUT, ':utf8');

my $object = "\x{1F600}";
my $hd_db  = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password');
$hd_db->do('SET NAMES utf8mb4');

# cleanup
my $delete = $hd_db->prepare("DELETE FROM mytable");
$delete->execute;

my $insert = $hd_db->prepare("INSERT INTO mytable (object) VALUES ('" . $object . "')");
$insert->execute;
my $select = $hd_db->prepare("SELECT * FROM mytable");
$select->execute;
my $row    = $select->fetchrow_hashref;

say $object;
say $row->{'object'};

Expected output:

😀
😀

Actual output:

😀
�

Seems like a bug to me. Any suggestion how to work around it?

EDIT: SELECTing the data from within the mysql client also shows the expected emoji

mysql> SET SESSION CHARACTER_SET_CLIENT = utf8mb4;
mysql> SET SESSION CHARACTER_SET_RESULTS = utf8mb4;
mysql> SELECT * FROM mytable;
+--------+
| object |
+--------+
| 😀      |
+--------+
5
  • 1
    You should really use placeholders. Commented Oct 13, 2017 at 10:18
  • Are you referring to the prepare statements? I usually do, but that did not seem relevant here Commented Oct 13, 2017 at 10:30
  • it was relevant enough that I took the time to point to it out ;-) Commented Oct 13, 2017 at 10:39
  • @simbabque, Please explain how using placeholders would resolve the issue with the utf8mb4 character set in the Perl client. Commented Oct 13, 2017 at 16:36
  • 2
    @Bill I never said it would. Just pointing out good style. We educate where we can. :) Commented Oct 13, 2017 at 16:37

3 Answers 3

5

You told MySQL to use UTF-8 for communication, but you also need to tell DBD::mysql to decode the data (or do it yourself).

You want

my $dbh = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password', {
   mysql_enable_utf8mb4 => 1,
})
   or die($DBI::errstr);

which is equivalent to

my $dbh  = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password')
   or die($DBI::errstr);

$dbh->do('SET NAMES utf8mb4')
   or die($dbh->errstr);

$dbh->{mysql_enable_utf8mb4} = 1;
Sign up to request clarification or add additional context in comments.

1 Comment

I accepted this answer as it is the way to go for DBI versions >= 4.041_01. Debian 8 ships with 3.0.17. For that version decoding works when opting for mysql_enable_utf8 => 1, see this post
1

The workaround is to let MySQL treat everything as bytes and to do the encoding in your application.

use Encode qw(encode decode);

my $object = "\x{1F600}";
my $hd_db  = DBI->connect('DBI:mysql:mydb:localhost', 'user', 'password');
$hd_db->do('SET NAMES latin1');

...

my $insert = $hd_db->prepare("INSERT INTO mytable (object) VALUES ('" . 
    encode("UTF-8",$object) . "')"); # or equiv statement with placeholders
$insert->execute;

...

my $select = $hd_db->prepare("SELECT * FROM mytable");
$select->execute;
my $row    = $select->fetchrow_hashref;
say $object;
say decode("UTF-8",$row->{'object'});

1 Comment

Thanks for the suggestion, but unfortunately I would have to revisit more than 1k db queries in my application. And worse, they would have to be tested..
0

"\x{1F600}"; is "Unicode", not "utf8". They are related, but they are not the same encoding.

You need UTF-8 (as the non-mysql world calls it) and utf8mb4 (as MySQL calls it).

😀 is hex F09F9880 (in utf8mb4); it is 😀 if you convert through CHARACTER SET latin1 ("Mojobake")

Please run SELECT HEX(object) ... to see whether you get those 4 hex bytes or something else. Then we will know whether to focus on the INSERT or the SELECT.

You say "actual output" -- but where is this? A web page? Is it configured for UTF-8? Or something else? If it is your commandline window, then make sure it is set for UTF-8. In windows, that is done via chcp 65001 .

You mentioned

mysql> SET SESSION CHARACTER_SET_CLIENT = utf8mb4;
mysql> SET SESSION CHARACTER_SET_RESULTS = utf8mb4;

That's only 2 of the 3 that need to be set. Better to simply do

SET NAMES utf8mb4;

2 Comments

It was console output and it worked out of the box with both Ubuntu and W10/Putty 0.7. Win7/Putty 0.7 does not work out of the box, though I have not tried chcping
Compare my.cnf. You may find different defaults. What versions of MySQL?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.