1

I am currently working on a system where performance is an important consideration. It is going to be used for processing large quantities of data (some of the object types are in millions) with non-trivial algorithms (think about Integer Programming problems etc.). At the moment I have a working solution which creates all these data points as Objects.

Is there any performance increase to be gained, by treating them as arrays for example? Are there any best practices for working with large numbers of objects in Java (should it be avoided?).

4
  • 2
    To be honest objects do have more of a performance hit than primatives (less so if they're short lived) but before you make any decision profile to make sure this is a major bottleneck because the primatives way is likely to be a lot less programmer friendly Commented Jul 31, 2013 at 16:36
  • To be honest I am more worried about working with large numbers of objects- the creation time is not a big issue. I am wondering how this can be profiled without rewriting most of the code. Commented Jul 31, 2013 at 16:38
  • I don't understand, the profiler will group them all together if they are all created in the same place (in a loop for example) Commented Jul 31, 2013 at 16:41
  • Profile use VisualVM or JProfiler Commented Aug 1, 2013 at 10:13

4 Answers 4

4

I suggest you start by using a commercial CPU and memory profiler. This will give you a good idea of what are your bottleneck.

Reducing garbage and making your memory more compact helps more when your have optimised the code to the point that your profilers cannot suggest anything.

You might like to consider what structures which fit in your CPU caches better as this can improve performance by up to 2-5x. e.g. Your L3 cache might be 8 MB, and more than 5x faster than main memory. The more you can condense your working set to fit into it the better.

BTW Your L1 cache is 32 KB and ~10x faster again.

This all assumes that the time to perform a GC doesn't bother you. If you create enough objects you can see multi-second, even multi-minute GC stop-the-world pauses.

Sign up to request clarification or add additional context in comments.

1 Comment

Assuming your cache input and output speed is fast of course. Good suggestions, +1
2

Arrays or ArrayLists have similar performance although arrays are faster (up to 25% depending on what you do with them). Where you can find a significant performance gain is by avoiding boxed primitives for calculations, in which case the only solution is to use an array.

Apart from that, creating many short lived objects incurs little performance cost, apart from the fact that GC will run more often (but the cost of running minor GC depends on the number of reachable objects, not on unreachable ones).

Comments

2

Premature optimization is evil. As Richard says in comments, write your code, see if its slow, then improve it. If you have suspicions write an example to simulate high load. The time spent up front to determine this is worth it.

But as for your question...

Yes, creating objects is more expensive compared to creating primitives. It also occupies more heap space (memory.) Also if you are using objects for only a short time the garbage collector will have to run more often which will eat some CPU.

Again, only worry about this if you really need speed improvement.

4 Comments

some do tend to mask lack of subject expertise as desire to "not do premature optimizations", desire to make sane design decisions early is a good thing, not a premature optimization
That's precisely why I suggest simulating the system in the second sentence @bobah I also argue the time spent doing this is well worth it. What are you trying to say?
your answer implies that the author is trying to optimize something prematurely
Yes it does, and yes he might. But again, as I've suggested simulation to determine necessary system design before wasting time on optimization, I feel your comment is illogical.
0

Prototype key parts of your algorithms, test them in separation, find the slowest, improve, repeat. Stay single threads for as long as possible, but always make a note of what can be done in parallel.

At the end your bottleneck may be either of below:

  • CPU because if algorithm computational complexity => try finding better algorithm (or run on multiple CPUs in parallel if you are just slightly below the target, if you are far below then parallel processing won't help)
  • CPU because of excessive GC => profile memory, use low/zero-GC collections (trove4j etc.) or even arrays of primitive types, or even direct memory buffers from NIO, experiment
  • Memory - optimize data proximity (use chunked arrays matching cache sizes, etc).
  • Contentions on concurrent objects => revert to single threaded design, try lock-free synchronization primitives, etc.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.