Most efficient multithreading Database Insert in Java

Question

We have to read a lot of data from a HDD (~50GB) into our database, but our multithreading procedure is pretty slow (~2h for ~10GB), because of a Thread lock inside of org.sqlite.core.NativeDB.reset[native] (see thread sampler).

We read our data relatively fast and use our insert method to execute a prepared statement. But only if we collected like 500.000 datasets we commit all these statements to our database. Currently we use JDBC as Interface for our sqlite database.

Everything works fine so far, if you use one thread total. But if you want to use multiple threads you do not see much of a performance/speed increase, because only one thread can run at time, and not in parallel. We already reuse our preparedStatement and all threads use one instance of our Database class to prevent file locks (there is one connection to the database).

Unfortunately we have no clue how to improve our insert method any further. Is anyone able to give us some tips/solutions or a way how to not use this NativeDB.reset method? We do not have to use SQLite, but we would like to use Java.

(Threads are named 1,2,...,15)

private String INSERT = "INSERT INTO urls (url) VALUES (?);";

public void insert(String urlFromFile) {
  try {
    preparedStatement.setString(1, urlFromFile);
    preparedStatement.executeUpdate();
  } catch (SQLException e) {
    e.printStackTrace();
  }

}

Updated insert method as suggested by @Andreas , but it is still throwing some Exceptions

public void insert(String urlFromFile) {
try {
  preparedStatement.setString(1, urlFromFile);
  preparedStatement.addBatch();
  ++callCounter;
  if (callCounter%500000 == 0 && callCounter>0){
    preparedStatement.executeBatch();
    commit();
    System.out.println("Exec");
  }
} catch (SQLException e) {
  e.printStackTrace();
}

}

java.lang.ArrayIndexOutOfBoundsException: 9
at org.sqlite.core.CorePreparedStatement.batch(CorePreparedStatement.java:121)
at org.sqlite.jdbc3.JDBC3PreparedStatement.setString(JDBC3PreparedStatement.java:421)
at UrlDatabase.insert(UrlDatabase.java:85)

I i remember well SQLite itself only allow one operation at a time anyway. — litelite
– litelite, Commented May 4, 2017 at 20:09
@Andreas we store all our inserts and then commit like 500k at once. Batching brings no major improvement :( — tomtom2770
– tomtom2770, Commented May 4, 2017 at 20:12
@YCF_L We will give batching another shot, with our example provided. Thanks! — tomtom2770
– tomtom2770, Commented May 4, 2017 at 20:28
You're saying that batching, i.e. replacing executeUpdate() with addBatch(), followed by executeBatch() for every 1000+ calls to addBatch(), doesn't improve performance? I find that highly unlikely, unless your performance bottleneck is majorly in some other area, e.g. excessive number of indexes on the table being inserted into. Commit interval has very little to with it. — Andreas
– Andreas, Commented May 4, 2017 at 20:33

GreyBeardedGeek · Accepted Answer · 2017-05-04 20:22:08Z

1

Most databases have some sort of bulk insert functionality, though there's no standard for it, AFAIK.

Postrgresql has COPY, and MySql has LOAD DATA, for instance. I don't think that SQLite has this facility, though - it might be worth switching to a database that does.

answered May 4, 2017 at 20:22

GreyBeardedGeek

30.3k2 gold badges51 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

tomtom2770 Over a year ago

Any recomendations? Can you use these kind of functionality with JDBC or in general in a programming language or only directly on a database server?

GreyBeardedGeek Over a year ago

Looks like you can use Postgresql's COPY from jdbc - see jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/…

CL. · Accepted Answer · 2017-05-05 08:51:53Z

0

SQLite has no write concurrency.

The fastest way to load a large amount of data is to use a single thread (and a single transaction) to insert everything into the DB (and not to use WAL).

answered May 5, 2017 at 8:51

CL.

182k18 gold badges241 silver badges282 bronze badges

Collectives™ on Stack Overflow

Most efficient multithreading Database Insert in Java

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related