9

I am playing with different buffer sizes to be inserted into the local SQLite DB and have found that it takes nearly 8 minutes to inserts 10,000,000 rows of data, when buffer size is 10,000. In other words, it takes 1,000 writes to store everything.

8 minutes to store 10,000,000 seems a bit too long (or is it?)

Can any of the below be optimized to increase the speed? Please note that data being inserted is a random collection of characters.

public int flush() throws SQLException {
    String sql = "insert into datastore values(?,?,?,?);";

    PreparedStatement prep = con.prepareStatement(sql);

    for (DatastoreElement e : content) { // content is 10,000 elements long
        _KVPair kvp = e.getKvp();

        prep.setInt(1, e.getMetaHash());
        prep.setInt(2, kvp.hashCode());
        prep.setString(3, kvp.getKey());
        prep.setString(4, kvp.getValue());

        prep.addBatch();
    }

    int[] updateCounts = prep.executeBatch();

    con.commit();

    return errorsWhileInserting(updateCounts);
}

When table is created it is done via

    statement.executeUpdate("create table datastore 
               (meta_hash INTEGER," + 
               "kv_hash   INTEGER," + 
               "key TEXT," +
               "value TEXT);");

Can any of the above be further optimized please?

4
  • 2
    I would try smaller batch sizes e.g. 2500 or 1000, to see if it makes any difference, it could be slower but it might be faster. Commented Aug 23, 2012 at 15:36
  • 1
    I agree with Peter. I have found that when using SqlServer, using anymore than 1000 for a bacth size simply slowed down the process. I was able to get my program to insert a couple million records in under a minute when using 1000 but 10000 made it takes a few minutes. Commented Aug 23, 2012 at 15:42
  • Is there a way to find a magic number of a batch size? In my tests, 10,000 performs better than 1000 (slightly) Commented Aug 23, 2012 at 15:52
  • No magic, all performance optimisation is about testing your hunches. BTW you are getting 20,833 rows per second. I get around the same when bulk loading data into MS SQL Server via bcp.exe, which is that fastest way to load data. So your numbers are not too bad. But in my case I'm inserting into a view which runs a trigger which manipulates the data before final insert to tables, so ... could be faster. Commented Aug 23, 2012 at 16:10

2 Answers 2

16

I'm a bit hazy on the Java API but I think you should start a transaction first, otherwise calling commit() is pointless. Do it with conn.setAutoCommit(false). Otherwise SQLite will be journaling for each individual insert / update. Which requires syncing the file, which will contribute towards slowness.

EDIT: The questioner updated to say that this is already set true. In that case:

That is a lot of data. That length of time doesn't sound out of this world. The best you can do is to do tests with different buffer sizes. There is a balance between buffer jitter from them being too small and virtual memory kicking in for large sizes. For this reason, you shouldn't try to put it all into one buffer at once. Split up the inserts into your own batches.

Sign up to request clarification or add additional context in comments.

4 Comments

Where should begin() be called? On what object please?
On the connection. It looks like the method is called con.setAutoCommit(false); in JDBC. Call it before you start inserting items.
Right. I forgot to mention this .. it is of course set. Without it takes forever
So other then that, does 8 minutes look reasonable to you? We're talking about 6G worth of data here.
3

You are only executing executeBatchonce, which means that all 10 million statements are send to the database in the executeBatch call. This is way too much to handle for a database. You should additionally execute int[] updateCounts = prep.executeBatch(); in your loop let's say all 1000 rows. Just make an if statement which tests on counter % 1000 == 0. Then the database can asynchronously already work on the data you sent.

1 Comment

I use 2yers ago this concept in my IOT project that have milion of rows and see this today , I upvote this for help people understanding the right way!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.