Java ExecutorService - sometimes slower than sequential processing?

Question

I'm writing a simple utility which accepts a collection of Callable tasks, and runs them in parallel. The hope is that the total time taken is little over the time taken by the longest task. The utility also adds some error handling logic - if any task fails, and the failure is something that can be treated as "retry-able" (e.g. a timeout, or a user-specified exception), then we run the task directly.

I've implemented this utility around an ExecutorService. There are two parts:

submit() all the Callable tasks to the ExecutorService, storing the Future objects.
in a for-loop, get() the result of each Future. In case of exceptions, do the "retry-able" logic.

I wrote some unit tests to ensure that using this utility is faster than running the tasks in sequence. For each test, I'd generate a certain number of Callable's, each essentially performing a Thread.sleep() for a random amount of time within a bound. I experimented with different timeouts, different number of tasks, etc. and the utility seemed to outperform sequential execution.

But when I added it to the actual system which needs this kind of utility, I saw results that were very variable - sometimes the parallel execution was faster, sometimes it was slower, and sometimes it was faster, but still took a lot more time than the longest individual task.

Am I just doing it all wrong? I know ExecutorService has invokeAll() but that swallows the underlying exceptions. I also tried using a CompletionService to fetch task results in the order in which they completed, but it exhibited more or less the same behavior. I'm reading up now on latches and barriers - is this the right direction for solving this problem?

you might consider using Akka (akka.io) to solve this problem — Brian Kent
– Brian Kent, Commented Sep 19, 2013 at 19:48

Gray · Accepted Answer · 2013-09-19 18:37:42Z

3

I wrote some unit tests to ensure that using this utility is faster than running the tasks in sequence. For each test, I'd generate a certain number of Callable's, each essentially performing a Thread.sleep() for a random amount of time within a bound

Yeah this is certainly not a fair test since it is using neither CPU nor IO. I certainly hope that parallel sleeps would run faster than serial. :-)

But when I added it to the actual system which needs this kind of utility, I saw results that were very variable

Right. Whether or not a threaded application runs faster than a serial one depends a lot on a number of factors. In particular, IO bound applications will not improve in performance since they are bound by the IO channel and really cannot do concurrent operations because of this. The more processing that is needed by the application, the larger the win is to convert it to be multi-threaded.

Am I just doing it all wrong?

Hard to know without more details. You might consider playing around with the number of threads that are running concurrently. If you have a ton of jobs to process you should not be using a Executos.newCachedThreadPool() and should optimized the newFixedSizeThreadPool(...) depending on the number of CPUs your architecture has.

You also may want to see if you can isolate the IO operations in a few threads and the processing to other threads. Like one input thread reading from a file and one output thread (or a couple) writing to the database or something. So multiple sized pools may do better for different types of tasks instead of using a single thread-pool.

tried using a CompletionService to fetch task results in the order in which they completed

If you are retrying operations, using a CompletionService is exactly the way to go. As jobs finish and throw exceptions (or return failure), they can be restarted and put back into the thread-pool immediately. I don't see any reason why your performance problems would be because of this.

edited Sep 19, 2013 at 18:37

answered Sep 19, 2013 at 18:32

Gray

117k24 gold badges305 silver badges360 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

RuslanD Over a year ago

Hi Gray, thanks for the detailed response. It is true that my actual tasks perform I/O operations - searching in a Lucene index. I guess I'll have to dig into the documentation to see if there's an efficient way to do search in parallel. My threadpool is a fixed threadpool with a capacity for 32 threads.

Gray Over a year ago

The biggest win you might get @RuslanD is to switch that box to use SSDs. Improving the IO chain may be your best way to improve its speed as opposed to spending time on the threading. At some point, as the IO chain ceased to become the bottleneck, will the threading be necessary.

RuslanD Over a year ago

I wish I could make such hardware decisions in my organization :) The reality is that the system is running on a bunch of hosts, and they don't have/won't have SSDs. So I need to figure some way of getting better perf than simply running the I/O tasks sequentially.

Gray Over a year ago

Yeah, understood. Memory file-systems will also give you the same performance boost @RuslanD.

Peter Lawrey · Accepted Answer · 2013-09-19 18:38:12Z

3

Multi-threaded programming doesn't come for free. It has an overhead. The over head can easily exceed and performance gain and usually makes your code more complex.

Additional threads give access to more cpu power (assuming you have spare cpus) but in general they won't make you HDD spin faster , give you more network bandwidth or speed up something which is not cpu bound.

Multiple threads can help give you a greater share of an external resource.

answered Sep 19, 2013 at 18:38

Peter Lawrey

535k83 gold badges770 silver badges1.2k bronze badges

Collectives™ on Stack Overflow

Java ExecutorService - sometimes slower than sequential processing?

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related