4

One jdbc "select" statement takes 5 secs to complete. So doing 5 statements takes 25 secs.

Now I try to do the job in parallel. The db is mysql with innodb. I start 5 threads and give each thread its own db connection. But it still takes 25 secs for all to complete?

Note I provide java with enough heap and have 8 cores but only one hd (maybe having only one hd is the bottleneck here?)

Is this the expected behavour with mysql out of the box? here is example code:

public void doWork(int n) {
        try (Connection conn = pool.getConnection();
             PreparedStatement stmt = conn.prepareStatement("select id from big_table where id between "+(n * 1000000)" and " +(n * 1000000 +1000000));
        ) { 
            try (ResultSet rs = stmt.executeQuery();) {
                while (rs.next()) {
                    Long itemId = rs.getLong("id");
                }
            }
        }
}

public void doWorkBatch() {
    for(int i=1;i<5;i++)
        doWork(i);
}

public void doWorkParrallel() {
    for(int i=1;i<5;i++)
        new Thread(()->doWork(i)).start();
    System.console().readLine();
}

(I don't recall where but I read that a standard mysql installation can easily handle 1000 connections in parallel)

1
  • In your process, is it possible to run one query and perform in memory processing to separate the records? Commented Jan 30, 2015 at 16:50

4 Answers 4

4

Looking at your problem definitely multi-threading will improve your performance because even i once converted an 4-5 hours batch job into a 7-10 minute job by doing exactly the same what you're thinking but you need to know the following things before hand while designing :-

1) You need to think about inter-task dependencies i.e. tasks getting executed on different threads.

2) Using connection pool is a good sign since Creating Database connections are slow process in Java and takes long time.

3) Each thread needs its own JDBC connection. Connections can't be shared between threads because each connection is also a transaction.

4) Cut tasks into several work units where each unit does one job.

5) Particularly for your case, i.e. using mysql. Which database engine you use would also affect the performance as the InnoDB engine uses row-level locking. This way, it will handle much higher traffic. The (usual) alternative, however, (MyISAM) does not support row-level locking, it uses table locking. i'm talking about the case What if another thread comes in and wants to update the same row before the first thread commits.

6) To improve performance of Java database application is running queries with setAutoCommit(false). By default new JDBC connection has there auto commit mode ON, which means every individual SQL Statement will be executed in its own transaction. while without auto commit you can group SQL statement into logical transaction, which can either be committed or rolled back by calling commit() or rollback().

You can also checkout springbatch which is designed for batch processing.

Hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

3

It depends where the bottleneck in your system is... If your queries spend a few seconds each establishing the connection to the database, and only a fraction of that actually running the query, you'd see a nice improvement. However if the time is spent in mysql, running the actual query, you wouldn't see as much of a difference.

The first thing I'd do, rather than trying concurrent execution is to optimize the query, maybe add indices to your tables, and so forth.

2 Comments

each query does a range select on a 20 million row table
I doubt you'd get much improvement using multiple threads then. But it's worth trying it out!
1

Concurrent execution may be faster. You should also consider batch execution.

5 Comments

what does batch mean in this context.. I am just doing 5 selcts?
Depending on the implementation, it is often possible to execute the selects in only one transaction, which gives better performance.
but I am doing the selects on different connections
Do you mean they concern 5 differents schemas?
batch execution will never work with a series of select statements.
1

Concurrent execution will help if there is any room for parallelization. In your case, there seems to be no room for parallelization, because you have a very simple query which performs a sequential read of a huge amount of data, so your bottleneck is probably the disk transfer and then the data transfer from the server to the client.

When we say that RDBMS servers can handle thousands of requests per second we are usually talking about the kind of requests that we usually see in web applications, where each SQL query is slightly more complicated than yours, but results in much smaller disk reads (so they are likely to be found in a cache) and much smaller data transfers (stuff that fit within a web page.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.