0

I am trying to set up a database to store string data that is in multiple languages and includes Chinese letters among many others.

Steps I have taken so far:

  1. I have created a schema which uses utf8mb4 character set and utf8mb4_unicode_ci collation.

  2. I have created a table which includes CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; at the end of the CREATE statement.

  3. I am attempting to LOAD DATA INFILE from a CSV file with CHARACTER SET utf8mb4 specified in the LOAD statement.

However, I am receiving an error Error Code: 1366. Incorrect string value: '\xCE\x09DIS' for column 'company_name' at row 43630.

1
  • What is the encoding of the string data? Commented Feb 26, 2019 at 19:15

1 Answer 1

1

Did it successfully parse 43629 rows? Then croak on that row? It may actually be garbage in the file.

Do you know what that company name should be? What does the rest of the line say?

Do you have another example? Remove that one line and run the LOAD again.

CE can be interpreted by any 1-byte charset, but not necessarily in a meaningful way.

09 is the "tab" character in virtually all charsets; is it reasonable to have a tab in a company name??

Sign up to request clarification or add additional context in comments.

2 Comments

What do I do about a beta symbol as in beta-carotene? stackoverflow.com/questions/64687739/…
@webNoob13 CEB2 is the utf-8 encoding for the lowercase Greek "beta". CE09 does not make sense; it is not a correct utf-8 encoding and 09 is "tab" in most encodings. Check my Python tips: mysql.rjweb.org/doc.php/charcoll#python

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.