MySQL - LOAD DATA from LOCAL INFILE - How to increase performance

Question

Greetings Support Community,

I have about 10 million+ files that I am trying to load into MySQL database using the following script:

WORKING_DIR=/tmp
FILE1="*test*"
timestamp_format="%Y-%m-%d %H:%i:%s.%x"

for i in ${WORKING_DIR}/${FILE1}
do
    if [ -f "$i" ]; then
    mysql -uroot -ptest my_database --local-infile=1<<-SQL
    SET sql_log_bin=0;
    LOAD DATA LOCAL INFILE '${i}' INTO TABLE my_table
    FIELDS TERMINATED BY ','
    OPTIONALLY ENCLOSED BY '\"'
    LINES  TERMINATED BY '\n'
    IGNORE 1 LINES
    (id, transaction_id, app_id, sub_id);
    SQL
    fi
done

Its an extremely slow process. After about 24 hours, I've only been able to load about 2 million records. In each file, there is one record. At this rate, this will complete in about 5 days. Is there a faster way of doing this? E.g. Should I concatenate the files before processing?

Any suggestion to improve loading this data into MySQL would be greatly appreciated.

Thanks!

If the suggestion is to concatenate the files before processing, how would I efficiently concatenate the 10 million+ files? Thank you! — user3567212
– user3567212, Commented Sep 20, 2016 at 18:46
what operating system? Are they all in one directory? As an aside, why would a file contain 1 row? Oh, /tmp, Linux. — Drew
– Drew, Commented Sep 20, 2016 at 19:46
I see you have IGNORE 1 LINES - does that mean that each file has a header row? If you concatenate the files you may need to remove the header row. — Andrew Morton
– Andrew Morton, Commented Sep 20, 2016 at 20:01

tripleee · Accepted Answer · 2016-09-21 13:46:09Z

1

You ask (in a comment) how to concatenate your files. That would be

cat /tmp/*test1*

though apparently you actually want to omit the first line from each:

awk 'FNR>1' /tmp/*test1*

How to make your SQL version read from standard input is beyond my competence. If you can't, maybe save the output to a temporary file, and process that.

If you get "argument list too long" maybe try

find /tmp -maxdepth 1 -type f -name '*test1*' -exec awk 'FNR>1' {} +

The -maxdepth 1 says not to descend into subdirectories; take it out if that's not what you want.

The -exec with a plus might not be available on really old systems; try with \; in its place if you get a syntax error (though there can be a rather unpleasant performance penalty).

I don't see that the variables made anything clearer, easier, more readable, or more mainatainable, so I simply took them out.

edited Sep 21, 2016 at 13:46

answered Sep 21, 2016 at 3:04

tripleee

192k37 gold badges318 silver badges367 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

MySQL - LOAD DATA from LOCAL INFILE - How to increase performance

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related