I have a large .csv file which I want to import into a MySQL database. I want to use the LOAD DATA INFILE statement on the basis of its speed.
Fields are terminated by -|-. Lines are terminated by |--. Currently I am using the following statement:
LOAD DATA LOCAL INFILE 'C:\\test.csv' INTO TABLE mytable FIELDS TERMINATED BY '-|-' LINES TERMINATED BY '|--'
Most rows look something like this: (Note that the strings are not enclosed by any characters.)
goodstring-|--|-goodstring-|-goodstring-|-goodstring|--
goodstring-|--|-goodstring-|-goodstring-|-|--
goodstring-|-goodstring-|-goodstring-|-goodstring-|-|--
goodstring is a string that does not contain - as a character. As you can see the second or last column might be empty. Rows like the above do not cause any problems. However the last column may contain - characters. There might be a row that looks something like this:
goodstring-|--|-goodstring-|-goodstring-|---|--
The string -- in the last column causes problems. MySQL detects six instead of five columns. It inserts a single - character into the fifth column and truncates the sixth. The correct DB row should be ("goodstring", NULL, "goodstring", "goodstring", "--").
A solution would be to tell MySQL to regard everything after the fourth field has been terminated as part of the fith column (up until the line is terminated). Is this possible with LOAD DATA INFILE? Are there methods that yield the same result, do not require the source file to be edited and perform about as fast as LOAD DATA INFILE?
SET fifthColumn=CONCAT(@fifthField,@sixthField)per this blog post.FIELDS TERMINATED BY '¿^?fish╔&®)'