We have to read a lot of data from a HDD (~50GB) into our database, but our multithreading procedure is pretty slow (~2h for ~10GB), because of a Thread lock inside of org.sqlite.core.NativeDB.reset[native] (see thread sampler).
We read our data relatively fast and use our insert method to execute a prepared statement. But only if we collected like 500.000 datasets we commit all these statements to our database. Currently we use JDBC as Interface for our sqlite database.
Everything works fine so far, if you use one thread total. But if you want to use multiple threads you do not see much of a performance/speed increase, because only one thread can run at time, and not in parallel.
We already reuse our preparedStatement and all threads use one instance of our Database class to prevent file locks (there is one connection to the database).
Unfortunately we have no clue how to improve our insert method any further. Is anyone able to give us some tips/solutions or a way how to not use this NativeDB.reset method? We do not have to use SQLite, but we would like to use Java.
(Threads are named 1,2,...,15)
private String INSERT = "INSERT INTO urls (url) VALUES (?);";
public void insert(String urlFromFile) {
try {
preparedStatement.setString(1, urlFromFile);
preparedStatement.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
}
Updated insert method as suggested by @Andreas , but it is still throwing some Exceptions
public void insert(String urlFromFile) {
try {
preparedStatement.setString(1, urlFromFile);
preparedStatement.addBatch();
++callCounter;
if (callCounter%500000 == 0 && callCounter>0){
preparedStatement.executeBatch();
commit();
System.out.println("Exec");
}
} catch (SQLException e) {
e.printStackTrace();
}
}
java.lang.ArrayIndexOutOfBoundsException: 9
at org.sqlite.core.CorePreparedStatement.batch(CorePreparedStatement.java:121)
at org.sqlite.jdbc3.JDBC3PreparedStatement.setString(JDBC3PreparedStatement.java:421)
at UrlDatabase.insert(UrlDatabase.java:85)


executeUpdate()withaddBatch(), followed byexecuteBatch()for every 1000+ calls toaddBatch(), doesn't improve performance? I find that highly unlikely, unless your performance bottleneck is majorly in some other area, e.g. excessive number of indexes on the table being inserted into. Commit interval has very little to with it.