2

I am trying to implement simple program in Java that will be used to populate a MySQL database from a CSV source file. For each row in the CSV file, I need to execute following sequence of SQL statements (example in pseudo code):

execute("INSERT INTO table_1 VALUES(?, ?)");
String id = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_2 VALUES(?, ?)");
String id2 = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_3 values("some value", id1, id2)");
execute("INSERT INTO table_3 values("some value2", id1, id2)");
...

There are three basic problems:
1. Database is not on localhost so each single INSERT/SELECT has latency and this is the basic problem
2. CSV file contains millions of rows (like 15 000 000) so it takes too long.
3. I cannot modify the database structure (add extra tables, disable keys etc).

I was wondering how can I speed up the INSERT/SELECT process? Currently 80% of the execution time is consumed by communication.

I already tried to group the above statements and execute them as batch but because of LAST_INSERT_ID it does not work. In any other cases it takes too long (see point 1).

2
  • note that last_insert_id is linked to an autoincrement key which is monotonically increasing. If you know that you're the only one inserting than you can keep track of the key inside the java code. Make sure you sample the first few inserts to see what the start value for the key is and the offset (it doesn't have to be 1) Commented May 3, 2011 at 21:06
  • I thought about it, but unfortunately I'm no the only user of database at this time. Commented May 4, 2011 at 3:57

2 Answers 2

3

Fastest way is to let MySQL parse the CSV and load records into the table. For that, you can use "LOAD DATA INFILE":

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

It works even better if you can transfer the file to server or keep it on a shared directory that is accessible to server.

Once that is done, you can have a column that indicates whether the records has been processed or not. Its value should be false by default.

Once data is loaded, you can pick up all records where processed=false.

For all such records you can populate table 2 and 3.

Since all these operation would happen on server, server <> client latency would not come into the picture.

Sign up to request clarification or add additional context in comments.

2 Comments

Load data infile only works if the import-csv-file is on the same machine as MySQL, not across the network. OP said: Database is not on localhost
Load data works on client too. MySQL drive sends the file across the wire to server to be loaded. From the manual : The --local option causes mysqlimport to read data files from the client host. You can specify the --compress option to get better performance over slow networks if the client and server support the compressed protocol.
1

Feed the data into a blackhole

CREATE TABLE  `test`.`blackhole` (
  `t1_f1` int(10) unsigned NOT NULL,
  `t1_f2` int(10) unsigned NOT NULL,
  `t2_f1` ... and so on for all the tables and all the fields.
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1;

Note that this is a blackhole table, so the data is going nowhere.
However you can create a trigger on the blackhole table, something like this.

And pass it on using a trigger

delimiter $$

create trigger ai_blackhole_each after insert on blackhole for each row
begin
  declare lastid_t1 integer;
  declare lastid_t2 integer;

  insert into table1 values(new.t1_f1, new.t1_f2);
  select last_insert_id() into lastid_t1;
  insert into table2 values(new.t2_f1, new.t2_f1, lastid_t1);
  etc....
end$$

delimiter ;

Now you can feed the blackhole table with a single insert statement at full speed and even insert multiple rows in one go.

insert into blackhole values(a,b,c,d,e,f,g,h),(....),(...)...

Disable index updates to speed things up

ALTER TABLE $tbl_name DISABLE KEYS;
....Lot of inserts
ALTER TABLE $tbl_name ENABLE KEYS;

Will disable all non-unique key updates and speed up the insert. (an autoincrement key is unique, so that's not affected)

If you have any unique keys and you don't want MySQL to check for them during the mass-insert, make sure you do an alter table to eliminate the unique key and enable it afterwards.
Note that the alter table to put the unique key back in will take a long time.

2 Comments

I forgot about one very esential problem. I cannot create any extra tables or disable keys. In other words I cannot modify the database at all. But I am sure that your solution will work in other case.
You can also create the blackhole table and trigger in another database on the same MySQL server and change the insert statements inside the trigger to insert into database1.table1 ...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.