2

I have a large .csv file which I want to import into a MySQL database. I want to use the LOAD DATA INFILE statement on the basis of its speed.

Fields are terminated by -|-. Lines are terminated by |--. Currently I am using the following statement:

LOAD DATA LOCAL INFILE 'C:\\test.csv' INTO TABLE mytable FIELDS TERMINATED BY '-|-' LINES TERMINATED BY '|--'

Most rows look something like this: (Note that the strings are not enclosed by any characters.)

goodstring-|--|-goodstring-|-goodstring-|-goodstring|--
goodstring-|--|-goodstring-|-goodstring-|-|--
goodstring-|-goodstring-|-goodstring-|-goodstring-|-|--

goodstring is a string that does not contain - as a character. As you can see the second or last column might be empty. Rows like the above do not cause any problems. However the last column may contain - characters. There might be a row that looks something like this:

goodstring-|--|-goodstring-|-goodstring-|---|--

The string -- in the last column causes problems. MySQL detects six instead of five columns. It inserts a single - character into the fifth column and truncates the sixth. The correct DB row should be ("goodstring", NULL, "goodstring", "goodstring", "--").

A solution would be to tell MySQL to regard everything after the fourth field has been terminated as part of the fith column (up until the line is terminated). Is this possible with LOAD DATA INFILE? Are there methods that yield the same result, do not require the source file to be edited and perform about as fast as LOAD DATA INFILE?

3
  • 1
    "Is it possible to tell MySQL to regard everything after the fourth field as the fifth column?" Yes, along the lines of SET fifthColumn=CONCAT(@fifthField,@sixthField) per this blog post. Commented Sep 7, 2015 at 0:06
  • 1
    we usually use FIELDS TERMINATED BY '¿^?fish╔&®)' Commented Sep 7, 2015 at 0:21
  • Thank you @bishop ! That blog post was just what I needed. Commented Sep 7, 2015 at 3:22

1 Answer 1

1

This is my solution:

LOAD DATA
LOCAL INFILE 'C:\\test.csv'
INTO TABLE mytable
FIELDS TERMINATED BY '-|-'
LINES TERMINATED BY '-\r\n'
(col1, col2, col3, col4, @col5, col6)
SET @col5 = (SELECT CASE WHEN col6 IS NOT NULL THEN CONCAT(@col5, '-') ELSE LEFT(@col5, LENGTH(@col5) - 2) END);

It will turn a row like this one:

goodstring-|--|-goodstring-|-goodstring-|-|--

Into this:

("goodstring", "", "goodstring", "goodstring", NULL)

And a bad row like this one:

goodstring-|--|-goodstring-|-goodstring-|---|--

Into this:

("goodstring", "", "goodstring", "goodstring", "")

I simply drop the last column after the import.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.