0

I have a weird issue for several years now. Here's the thing.

I run Rocky Linux (happens also in CentOS), running Apache 2.4.53 wiith MariaDB (mysql Ver 8.0.30 for Linux on x86_64 (Source distribution)

I have a Tcl script which executes a "curl" to retrieve data from another site. It comes in JSON format which I then parse (using the JSON package). I then insert data into a database, such as:

insert into table set name='Mário Flores';

As you can see there is an UTF-8 character (á). I have the database in utf8mb4 charset, everything is correctly set, the locale in the system is "en_US.UTF-8".

Now... if I have the script run in my Linux box, there are no issues. If I use my website, I click on a button which does a POST to my webserver (index.cgi) and I get an error:

Error: mysqlexec/db server: Incorrect string value: '\xE1rio...' for column 'name' at row 1

and that will then run the "curl" to get the data, parse the JSON and insert into the database. The code is the same, called the same way.

What could be the issue here? I can only solve the problem if, when run by web I do:

set name [encoding convertto utf-8 $name]

And then insert into the DB.

Tried both in Linux or via web, with different results. Expected everything being already UTF-8 compatible and no conversion needed

2
  • mysql Ver 8.0.30 for Linux looks like a client version while you mention MariaDB. If its really MariaDB include the MariaDB version select version(). The general problem is the tcl needs to connect using a utf8mb4 character set as the connection options in some way. set names utf8 as sql maybe. Commented Apr 17, 2023 at 0:26
  • This is not an issue. It is in fact MySQL, version 8.0.32 now. But the table and database is in utf8mb4. The problem seems to be in the JSON package (described below) Commented Dec 30, 2023 at 1:42

1 Answer 1

1

\xE1 sounds like latin1, definitely not utf8. Then connecting, set the charset encoding of the client. Alternatively, use SET NAMES latin1; after connecting.

E1 is the hex for á in any of these: cp1250, dec8, latin1, latin2, latin5.

C3A1 is the next in utf8 / utf8mb4.

As to "whether the data in the DB is..."...

  • Using utf8mb4 in the database allows all character sets of the world, including Emoji, to be represented.
  • With the correct configuration, MySQL is happy to convert to/from UTF-8 when INSERTing/SELECTing. The target charset (in the client) can be essentially any encoding. Latin1 is common; it has about 120 extra characters (accented letters and common symbols) in addition to ordinary ASCII letters, digits, and simple punctuation.

The column definitions control what is stored in the database.

The connection parameters specify what the client's charset is.

Sign up to request clarification or add additional context in comments.

4 Comments

The real question is whether the data in the DB is UTF-8 or Latin-1. That matters because it says whether the problem is in the insertion or the extraction. (The Tcl side would probably default to UTF-8 if it can't detect to do otherwise. Web code is more often Latin-1.)
@DonalFellows - I added more.
The problem seems to be with the JSON package. I read the data from the web (curl) and store it in a file. It is in JSON format, example: { "guestName": "In\u00eas Ferreira" } I the convert this to a dict using "json::json2dict": set str [json::json2dict $str] The data in str is: guest_name In�s Ferreira I had a look into the json_tcl.tcl package file and it seems to be doing the data parsing correctly using: subst -nocommands -novariables $unquoted This will effectively convert "\\u0ea" into "ê". But the final result of the dict has the garbled character displayed above.
Actually I copied the json package and found out that it may be the "subst" command. The json/dict functions simply parse the JSON data token by token. The string is then converted using something like: set str [subst $str] This will replace \u00ea into "ê" However... it seems it is replacing to "\xEA" which can be correctly displayed in Linux and eventually in the browser. But when inserted into the DB gives the error: insert into table set name="IN\xEAS"..... Which should be interpreted as "INÊS"

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.