0

We have to read a lot of data from a HDD (~50GB) into our database, but our multithreading procedure is pretty slow (~2h for ~10GB), because of a Thread lock inside of org.sqlite.core.NativeDB.reset[native] (see thread sampler).

We read our data relatively fast and use our insert method to execute a prepared statement. But only if we collected like 500.000 datasets we commit all these statements to our database. Currently we use JDBC as Interface for our sqlite database.

Everything works fine so far, if you use one thread total. But if you want to use multiple threads you do not see much of a performance/speed increase, because only one thread can run at time, and not in parallel. We already reuse our preparedStatement and all threads use one instance of our Database class to prevent file locks (there is one connection to the database).

Unfortunately we have no clue how to improve our insert method any further. Is anyone able to give us some tips/solutions or a way how to not use this NativeDB.reset method? We do not have to use SQLite, but we would like to use Java.

ThreadMonitor (Threads are named 1,2,...,15)

Thread Sampler

Resource Usage

private String INSERT = "INSERT INTO urls (url) VALUES (?);";

public void insert(String urlFromFile) {
  try {
    preparedStatement.setString(1, urlFromFile);
    preparedStatement.executeUpdate();
  } catch (SQLException e) {
    e.printStackTrace();
  }

}

Updated insert method as suggested by @Andreas , but it is still throwing some Exceptions

public void insert(String urlFromFile) {
try {
  preparedStatement.setString(1, urlFromFile);
  preparedStatement.addBatch();
  ++callCounter;
  if (callCounter%500000 == 0 && callCounter>0){
    preparedStatement.executeBatch();
    commit();
    System.out.println("Exec");
  }
} catch (SQLException e) {
  e.printStackTrace();
}

}

java.lang.ArrayIndexOutOfBoundsException: 9
at org.sqlite.core.CorePreparedStatement.batch(CorePreparedStatement.java:121)
at org.sqlite.jdbc3.JDBC3PreparedStatement.setString(JDBC3PreparedStatement.java:421)
at UrlDatabase.insert(UrlDatabase.java:85)
20
  • I i remember well SQLite itself only allow one operation at a time anyway. Commented May 4, 2017 at 20:09
  • 1
    @Andreas we store all our inserts and then commit like 500k at once. Batching brings no major improvement :( Commented May 4, 2017 at 20:12
  • 1
    Did you drop your indices before the inserts? Commented May 4, 2017 at 20:15
  • 1
    @YCF_L We will give batching another shot, with our example provided. Thanks! Commented May 4, 2017 at 20:28
  • 1
    You're saying that batching, i.e. replacing executeUpdate() with addBatch(), followed by executeBatch() for every 1000+ calls to addBatch(), doesn't improve performance? I find that highly unlikely, unless your performance bottleneck is majorly in some other area, e.g. excessive number of indexes on the table being inserted into. Commit interval has very little to with it. Commented May 4, 2017 at 20:33

2 Answers 2

1

Most databases have some sort of bulk insert functionality, though there's no standard for it, AFAIK.

Postrgresql has COPY, and MySql has LOAD DATA, for instance. I don't think that SQLite has this facility, though - it might be worth switching to a database that does.

Sign up to request clarification or add additional context in comments.

2 Comments

Any recomendations? Can you use these kind of functionality with JDBC or in general in a programming language or only directly on a database server?
Looks like you can use Postgresql's COPY from jdbc - see jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/…
0

SQLite has no write concurrency.

The fastest way to load a large amount of data is to use a single thread (and a single transaction) to insert everything into the DB (and not to use WAL).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.