I have a very annoying issue when transferring data from PHP (which is set up with default_charset = 'UTF-8') to MySQL, on an IIS environment with web.config set to:
<globalization requestEncoding="UTF-8" responseEncoding="UTF-8" />
Also tested on an Apache Environment with same default charset with same results. I also set MySQLi to set charset before any query:
$mysqli->set_charset('UTF-8');
I also have a set of prepared statements that search each character in a string to convert any none UTF-8 encoding to UTF-8 using:
mb_convert_encoding($char, 'UTF-8', mb_detect_encoding($char));
After trying and failing by parsing the whole string with the same process:
mb_convert_encoding($string, 'UTF-8', mb_detect_encoding($string));
The content type of the file with the script is set to content-type charset UTF-8 on the header. And still none of this has fixed the issue.
I also, have set all collations of the MySQL tables and schemas to utf8mb4, and of course using `utf8mb4_unicode_520_ci'. Still, issue persists.
The function I wrote to split the string is as follows:
static private function split_and_convert($value, $encoding = 'UTF-8') {
$split = mb_str_split($value);
$filter = [];
foreach($split as $chr) {
$from = mb_detect_encoding($chr);
$encoded = ($from !== $encoding)
? mb_convert_encoding($chr, $encoding, $from)
: $chr;
$filter[] = $encoded;
// echoing for testing
echo 'char ' . $encoded . ' (' . $from . ')<br>';
}
return join('', $filter);
}
Which echo the following trying to save Martínez:
char M (ASCII)
char a (ASCII)
char r (ASCII)
char t (ASCII)
char � (UTF-8)
char n (ASCII)
char e (ASCII)
char z (ASCII)
With MySQL throwing the following error:
Error No: 1366 - Incorrect string value: '\xEDnez' for column 'contactLast' at row 1
Still no joy. But if I do this, it works like a charm:
static private function split_and_convert($value, $encoding = 'UTF-8') {
$split = mb_str_split($value);
$filter = [];
foreach($split as $chr) {
$from = mb_detect_encoding($chr);
$encoded = ($from !== $encoding)
? mb_convert_encoding($chr, $encoding, $from)
// set to actually encode the 'UTF-8' encoded char
// I'm supposed not to do this, but is what works
// which makes no sense
: utf8_encode($chr);
$filter[] = $encoded;
// echoing for testing
echo 'char ' . $encoded . ' (' . $from . ')<br>';
}
return join('', $filter);
}
With the following results:
char M (ASCII)
char a (ASCII)
char r (ASCII)
char t (ASCII)
char í (UTF-8)
char n (ASCII)
char e (ASCII)
char z (ASCII)
And MySQL not throwing any error. If I replace the utf8_encode with mb_convert_encoding($encoded, 'UTF-8'), I get the same errors as not filtering the string.
I'm suspecting mb_detect_encoding is either detecting erroneously the encoding of the characters or simply the mb_convert_encoding is not doing it's job.
If anyone can help I will greatly appreciate it. I'm been searching for hours for a solution and none of the solutions offered have worked. And I find myself in a pickle because what works has been deprecated.
So, these are the list of the issues I'm facing:
- mb_convert_encoding -> not encoding or encoding to something other than UTF8
- mb_detect_encoding -> not detecting correctly
- Strings is encoded as ASCII not UTF-8 as charset in header requested.
- Confusion: Why I need to encode to UTF-8 an UTF-8 encoded character to work?
mb_convert_encoding()andmb_detect_encoding(). You shouldn't need to convert anything.str_split()splits on bytes, not on chars.utf8_encode()strictly converts from ISO-8859-1 to UTF-8.mb_convert_encoding()without the optional 3rd parameter specifying the source encoding will use the currently-configured default encoding from the PHP config. So if the former works, but not the latter, then it's likely that the configured default encoding doesn't match your input string, and you should be explicitly defining the input encoding. Additionally, text encoding cannot be reliably detected, and functions that purport to do so are guessing. Text encoding is metadata that you must be aware of at all times.