4

I have a large text file containing Arabic text data. When I try to load it into a MySQL table, I get error saying Error code 1300: invalid utf8 character string. This is what I have tried so far:

LOAD DATA INFILE '/var/lib/mysql-files/text_file.txt'
IGNORE INTO TABLE tblTest
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n';

I tried to ignore this error, but it does not work. I have tried LOCAL INFILE but it did not work, too. My database was created using DEFAULT CHAR SET UTF8 and DEFAULT COLLATE utf8_general_ci. The text file is utf-8 encoded.

I do not want the records which contain invalid utf8 characters. So how I can load the data with ignoring the records containing such invalid chars?

Thank in advance!

2 Answers 2

3

It would help to have the HEX of the naughty character.

A possible approach to reading all the text, then dealing with any bad characters:

  1. Read into a column of type VARBINARY or BLOB.

  2. Loop through the rows, trying to copy to a VARCHAR or TEXT column.

Another plan is to use utf8mb4 instead of utf8. It could be that the bad character is an Emoji or Chinese character that will work in utf8mb4, but not utf8.

Ignore errors

This may let you ignore errors:

SET @save := @@sql_mode;
LOAD DATA ...;
SET @@sql_mode := @save;
Sign up to request clarification or add additional context in comments.

6 Comments

In Python, I catch any exception and decide the action to do when such exception occurs, e.g. except ValueError: continue and so on. But I am not sure if this technique is available in MySQL.
Update: I had to check all the characters that are available in the text file using Python chars = {c for c in textfile}. I found a few more strange chars which have been cleaned using regex. After that, the MySQL load statement worked fine with utf-8 and text as types of text columns. Thanks are extended to Rick James for your kind help. I am marking the answer as accepted.
@RickJames. You have an enormous about of respect from me but your answer is unhelpful. The question isn't about bad characters. It's about ignoring bad characters on import.
@AuntJemima - Point made. I added to my answer. In reviewing sql_mode, I wonder if he is hitting a bad date or something unrelated to Arabic.
@RickJames. Thank you. It doesn't work, but thanks. MySQL 5.7 doesn't have an sql_mode that skips bad characters and the special term IGNORE still throws an error instead of a warning, which doesn't do any good either. dev.mysql.com/doc/refman/5.7/en/sql-mode.html. Is this the same in future releases of MySQL?
|
3

I have this problem when try use MySQL 5.7.14, too.

I returned to MySQL 5.6 and this problem is disappeared

3 Comments

Can you provide the specifics?
What is the value of sql_mode?
sql_mode="STRICT_ALL_TABLES"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.