1

In Perl v5.10.1, I try to read a file and store strings in a database. Problems arise when strings contain accents and exotic characters.

On my CentOS 6, the 'locale' command indicates: LANG=en_US.UTF-8

My DB is MySQL, the field I'm writing on is varchar(64) utf8_unicode_ci.

I run my test through a Putty console, set with Window > Translation > Remote character set: UTF8, though printed characters are garbled, but this is not the main problem.

Here's my script:

#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use open ':std', ':encoding(UTF-8)';
use DBI;

# A test string
my $test = 'é';
print "- 1: $test\n";

# First string in my file, containing a single 'é'
my $string = '';
open(my $fh, '<', 'myFile');
while(my $line = <$fh>) {
  chomp $line;
  $string = $line;
  last;
}
close $fh;
print "- 2: $string\n";


# Writing test string and first string in DB
my $dbistring = 'DBI:mysql:database=xxxx;host=xxxx;port=xxxx';
my $socket = DBI->connect($dbistring, 'xxxx', 'xxxx');
my $cmd = 'UPDATE Strings SET string="'.$test.'" WHERE id=1';
my $request = $socket->prepare($cmd);
$request->execute();
$cmd = 'UPDATE Strings SET string="'.$string.'" WHERE id=2';
$request = $socket->prepare($cmd);
$request->execute();

The prints are as follows:

  • 1: ▒

  • 2: ▒

In my DB table, fields end up as:

  • id 1: é

  • id 2: é

To avoid a possible double-encoding from Perl string concatenation, I tried:

$string = Encode::decode('UTF-8', $string);

giving me the same result. Same if I indicate '<:encoding(UTF-8)' when opening the file.

I am much confused, as my process chain seems to be all set in UTF8. Suggestions greatly appreciated.

1
  • At the very least, you need to add use open ':std', ':encoding(UTF-8)'; to add an encoding/decoding layer to STDIN/STDOUT/STDERR and to files opened in scope. Commented May 3, 2018 at 20:22

2 Answers 2

1

This valuable article provided the solution:

The problem arises during the communication between DBI and the DB, and is solved by adding the mysql_enable_utf8 flag during the connection:

DBI->connect($dbistring, 'xxxx', 'xxxx', { mysql_enable_utf8 => 1 });
Sign up to request clarification or add additional context in comments.

1 Comment

I forgot it effectively does SET NAMES utf8 when passed to connect, in addition to the decoding effect obtained from $dbh->{mysql_enable_utf8} = 1;
1

Some issues in Perl

use utf8;
use open ':std', ':encoding(UTF-8)';

my $dbh = DBI->connect("dbi:mysql:".$dsn, $user, $password, {
       PrintError => 0,
       RaiseError => 1,
       mysql_enable_utf8 => 1,  # Switch to UTF-8 for communication and decode.
});
# or {mysql_enable_utf8mb4 => 1} if using utf8mb4

Mojibake

See "Mojibake" in Trouble with UTF-8 characters; what I see is not what I stored for other issues to check on.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.