2

How long will take insert about 500.000 records from CSV file to MySQL database by Java code? Database hosted on localhost.

Table structure: AI id, | varchar(8) | datetime | int | varchar(2). My code need to insert 70.000 records over 40 minutes. Is there any way to do it faster? Here is the main part of my code:

CsvReader pro

ducts = new CsvReader(path);
products.readHeaders();
stmt = con.createStatement();
String updateString = "INSERT INTO table (T_V1, date, T_V2, T_V3) VALUES (?,?,?,?)";
PreparedStatement preparedStatement = con.prepareStatement(updateString);

            while (products.readRecord()) {
                v1= products.get("V1");
                date = format.parse(products.get("Date") + " " + products.get("Hour"));
                java.sql.Date dateDB = new java.sql.Date(data.getTime());
                v2 = products.get("V2");
                v3 = products.get("V3");



                preparedStatement.setString(1, v1);
                preparedStatement.setDate(2,dateDB);
                preparedStatement.setInt(3, Integer.parseInt(v2));
                preparedStatement.setString(4, v3);   
                preparedStatement.executeUpdate();
            }

According to your advice I moved creation of the statement out of the loop. Now I have 33 records per second, after I had 29 rps.

6
  • move the creation of the statement out of the while loop Commented Jan 16, 2016 at 12:29
  • You should not use Java to import data, use mysqlimport utility. Commented Jan 16, 2016 at 12:32
  • Once I had to insert millions of records based on pcap files to the DB. Partitioning the data into chunks and running it using Threads was way faster. Don't know how it works in MySQL. I used Oracle DB and had prodcuer threads creating table type (representing a set of DB records) and consumer threads creating DB transactions and calling stored procedure with those table type parameters inserting the data to the DB. But generally, if you already have CSV files, the fastest way would be to use some SQL Loader alternative dev.mysql.com/doc/refman/5.1/en/load-data.html Commented Jan 16, 2016 at 12:33
  • As suggested, create the PreparedStatement only once, outside the loop and additionally, consider using batch inserts (preparedStatement.addBatch()) as suggested in this post. Commented Jan 16, 2016 at 12:40
  • I moved the creation of the statement out of loop. I unfortunately have to do this with Java Commented Jan 16, 2016 at 12:46

5 Answers 5

2

I might opt for using the LOAD DATA statement from MySQL instead of using Java:

LOAD DATA LOCAL INFILE '/path/to/your/file.csv' INTO TABLE table;

This would avoid a lot of the overhead you currently have, assuming you are processing each line before inserting it into MySQL.

You can execute a LOAD DATA statement from Java using raw JDBC.

Sign up to request clarification or add additional context in comments.

7 Comments

But the OP asked explicitly to pass via Java.
This means nothing if the OP's hunch about using Java for this is wrong. LOAD DATA is the tool of choice for loading massive amounts of data, if it be possible to use it.
OP confirmed in the comments that Java needs to be used for the solution.
Of course, I didn't think that it is possible to do it this way! But it is useful only when I don't have to change anything in structure of my file, right?
Yes. If you need to do massaging of the raw data then using Java might be the best way to go.
|
0

Instead of create a PreparedStatement inside the while create a PreparedStatement outside and simply set the values inside the while loop.

Something like

    String updateString = "INSERT INTO table (T_V1, date, T_V2, T_V3) VALUES (?,?,?,?)";
    PreparedStatement preparedStatement = con.prepareStatement(updateString);
    while (products.readRecord()) {
            v1= products.get("V1");
            date = format.parse(products.get("Date") + " " + products.get("Hour"));
            java.sql.Date dateDB = new java.sql.Date(data.getTime());
            v2 = products.get("V2");
            v3 = products.get("V3");

            preparedStatement.setString(1, v1);
            preparedStatement.setDate(2,dateDB);
            preparedStatement.setInt(3, Integer.parseInt(v2));
            preparedStatement.setString(4, v3);   
            preparedStatement.executeUpdate();
        }

Additionaly you should commit every a number of rows that can be handled by the memory of your database engine, otherwyse after a certain number of inserts the system slow down very fast.

Note that generally should be possible to create more than 70.000 records in 40 minutes. Probably you have a bottleneck in your network. It is the database local to the java application or is a remote server? If it is a remote server check the connection speed.

1 Comment

It is a local database
0

you should go for batch insert

    PreparedStatement prepStmt = con.prepareStatement("Insert query");
    prepStmt.setString(1,parameter1);                 
    prepStmt.addBatch();

   // for next set of parameter
    prepStmt.setString(1,parameter2);                        
    prepStmt.addBatch();

    int [] numUpdates=prepStmt.executeBatch()

)

see Which is faster: multiple single INSERTs or one multiple-row INSERT?

How to do a batch insert in MySQL

Comments

0

If it's not necessary to insert code using Java, you can use SQL to insert data.

Use the code below in your GUI tool (SQLyog etc):

LOAD DATA LOCAL INFILE 'D:\\Book1.csv' INTO TABLE table_name FIELDS TERMINATED BY ','    
ENCLOSED BY '"' LINES TERMINATED BY '\r\n' (column_name1, column_name2);

Comments

-1

First you can create the preparedstatement out of your loop. You can also refactor your code to use multithreading because your insert statements seems not dependent each other, so you can process all the data by splitting it in parallel.

But there is no absolute answer to your question "How long...". It depends on machine where mysql is hosted and the machine where java code is executed: number of core, memory available, etc.

1 Comment

I moved this out of loop. I know that there is no answer, but I think that it is possible to do it faster, so propably I'm doing it wrong. Multithreading propably is too high level for me:) However, thank you for your reply

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.