Are there performance issues from using large numbers of objects in Java

Question

I am currently working on a system where performance is an important consideration. It is going to be used for processing large quantities of data (some of the object types are in millions) with non-trivial algorithms (think about Integer Programming problems etc.). At the moment I have a working solution which creates all these data points as Objects.

Is there any performance increase to be gained, by treating them as arrays for example? Are there any best practices for working with large numbers of objects in Java (should it be avoided?).

To be honest objects do have more of a performance hit than primatives (less so if they're short lived) but before you make any decision profile to make sure this is a major bottleneck because the primatives way is likely to be a lot less programmer friendly — Richard Tingle
– Richard Tingle, Commented Jul 31, 2013 at 16:36
To be honest I am more worried about working with large numbers of objects- the creation time is not a big issue. I am wondering how this can be profiled without rewriting most of the code. — bjedrzejewski
– bjedrzejewski, Commented Jul 31, 2013 at 16:38
I don't understand, the profiler will group them all together if they are all created in the same place (in a loop for example) — Richard Tingle
– Richard Tingle, Commented Jul 31, 2013 at 16:41

Peter Lawrey · Accepted Answer · 2013-07-31 16:44:04Z

4

I suggest you start by using a commercial CPU and memory profiler. This will give you a good idea of what are your bottleneck.

Reducing garbage and making your memory more compact helps more when your have optimised the code to the point that your profilers cannot suggest anything.

You might like to consider what structures which fit in your CPU caches better as this can improve performance by up to 2-5x. e.g. Your L3 cache might be 8 MB, and more than 5x faster than main memory. The more you can condense your working set to fit into it the better.

BTW Your L1 cache is 32 KB and ~10x faster again.

This all assumes that the time to perform a GC doesn't bother you. If you create enough objects you can see multi-second, even multi-minute GC stop-the-world pauses.

edited Jul 31, 2013 at 16:44

answered Jul 31, 2013 at 16:38

Peter Lawrey

535k83 gold badges770 silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

William Morrison Over a year ago

Assuming your cache input and output speed is fast of course. Good suggestions, +1

assylias · Accepted Answer · 2013-07-31 16:40:20Z

2

Arrays or ArrayLists have similar performance although arrays are faster (up to 25% depending on what you do with them). Where you can find a significant performance gain is by avoiding boxed primitives for calculations, in which case the only solution is to use an array.

Apart from that, creating many short lived objects incurs little performance cost, apart from the fact that GC will run more often (but the cost of running minor GC depends on the number of reachable objects, not on unreachable ones).

answered Jul 31, 2013 at 16:40

assylias

330k84 gold badges680 silver badges806 bronze badges

Comments

William Morrison · Accepted Answer · 2013-07-31 16:45:51Z

2

Premature optimization is evil. As Richard says in comments, write your code, see if its slow, then improve it. If you have suspicions write an example to simulate high load. The time spent up front to determine this is worth it.

But as for your question...

Yes, creating objects is more expensive compared to creating primitives. It also occupies more heap space (memory.) Also if you are using objects for only a short time the garbage collector will have to run more often which will eat some CPU.

Again, only worry about this if you really need speed improvement.

edited Jul 31, 2013 at 16:45

answered Jul 31, 2013 at 16:40

William Morrison

11.1k2 gold badges36 silver badges51 bronze badges

4 Comments

bobah Over a year ago

some do tend to mask lack of subject expertise as desire to "not do premature optimizations", desire to make sane design decisions early is a good thing, not a premature optimization

William Morrison Over a year ago

That's precisely why I suggest simulating the system in the second sentence @bobah I also argue the time spent doing this is well worth it. What are you trying to say?

bobah Over a year ago

your answer implies that the author is trying to optimize something prematurely

William Morrison Over a year ago

Yes it does, and yes he might. But again, as I've suggested simulation to determine necessary system design before wasting time on optimization, I feel your comment is illogical.

bobah · Accepted Answer · 2013-07-31 17:00:09Z

Prototype key parts of your algorithms, test them in separation, find the slowest, improve, repeat. Stay single threads for as long as possible, but always make a note of what can be done in parallel.

At the end your bottleneck may be either of below:

CPU because if algorithm computational complexity => try finding better algorithm (or run on multiple CPUs in parallel if you are just slightly below the target, if you are far below then parallel processing won't help)
CPU because of excessive GC => profile memory, use low/zero-GC collections (trove4j etc.) or even arrays of primitive types, or even direct memory buffers from NIO, experiment
Memory - optimize data proximity (use chunked arrays matching cache sizes, etc).
Contentions on concurrent objects => revert to single threaded design, try lock-free synchronization primitives, etc.

Collectives™ on Stack Overflow

Are there performance issues from using large numbers of objects in Java

4 Answers 4

1 Comment

Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related