How to create memory efficient data structures in Java

Question

If I understood correctly, Java creates some overhead per class. If I want to create typical data structures such as linked lists, trees, tries, etc. the individual (list) items will be classes and therefore create a significant overhead as opposed to similar data structures in C. This becomes especially difficult for very large data sets. Is there some better way to implement those kinds of data structures in Java such that I wont have the overhead associated with storing classes in memory?

Here the memory consumption of Java objects is described. If I have millions of objects, the overhead by using objects might become too expensive. So I was wondering if there are better ways to approach such a situation.

java has a very well developed group of classes and interfaces for that... see java.util.collection, as you are aware, nothing will happen on a low level like C, since the virtual machine is in the middle ALWAYS — ΦXocę 웃 Пepeúpa ツ
– ΦXocę 웃 Пepeúpa ツ, Commented May 23, 2017 at 5:28
Are you asking if there is a collection in Java that is implemented in C/C++/other-low-level-lang by standard (all VM implementation have it)? Or if there is a specific implementation you can use? — Liran Funaro
– Liran Funaro, Commented May 23, 2017 at 5:33
"the individual (list) items will be classes and therefore create a significant overhead as opposed to similar data structures in C" ... the class loading and object instantiating mechanisms in Java have been continuously developed for decades at this point and you can count on them to be fast and efficient. These facilities and the garbage collector will outperform most self-rolled object pools, for example (except for very expensive objects like database connections). You choose Java precisely because you want the benefits of object abstraction, otherwise you develop at the level of C. — scottb
– scottb, Commented May 23, 2017 at 7:23

starikoff · Accepted Answer · 2017-05-23 21:59:02Z

You can implement these collections over chunks of bytes (obtained as new byte[...] or ByteBuffer.allocate[Direct](...) or unsafe.allocateMemory(...)). You can then manage this memory manually: pack/unpack your objects to and from byte chunks along with additional data (like indices of left and right values for a binary tree, index of next for a linked list etc.) This way you will not have to spend memory on object headers, extra references, alignment (although you might decide you do need to introduce your own alignment); can have your objects offheap; can have them mapped to a filesystem for persistence etc. However, it's not simple and incurs subtleties (e.g. you might start depending on malloc implementation and lose JVM heap optimizations; lose memory model guarantees; your objects might be split between cache lines; you will lose benefits of GC compaction etc.). I'm not saying any of these is a show-stopper, just that it's not all roses and you should understand what exactly you are gaining. If you have millions of objects, well, it's likely that the overhead is in 100s of megabytes. Make sure it's worth it to try to save them (compared to how much necessary data takes + compared to how large your heap is).

user3811082 · Accepted Answer · 2017-05-23 05:54:52Z

0

You always can use the c++ native code inside Java (JNI) to increase the performance and the level of control (I don't think you really need this and I'm not sure that you can surpass standard java code).

answered May 23, 2017 at 5:54

user3811082

2511 silver badge7 bronze badges

Comments

Kevin Anderson · Accepted Answer · 2017-05-23 06:11:05Z

0

A quick Google search on "c++ library jni" turned up this article, entitled Wrapping a C++ library with JNI – introduction that might prove interesting. I didn't read it, so I make no recommendation or guarantee as to the contents.

answered May 23, 2017 at 6:11

Kevin Anderson

4,6123 gold badges15 silver badges21 bronze badges

Comments

Lie Ryan · Accepted Answer · 2017-05-23 06:38:46Z

0

If you have datasets where java's object size overhead is a practical issue, I'd suggest consider using a database. You can start with an in-memory, embedded database, like sqlite, h2, or redis.

As your data gets larger, you'll need more complex management. Updating cross references, indexes, and the like manually to ensure that your data can be efficiently queried is a huge effort that a database can help with.

Using a proper database also allows you to grow the data size further when your data starts reaching hundreds of gigabytes level where it no longer fits on memory, and when you have to transition to actually start using the disk, or even when you reach multi terabytes level when you have to use multiple machine(s) to hold the data, without major rewrites.

A proper database can grow together with your application, a bunch of objects in memory can't.

answered May 23, 2017 at 6:38

Lie Ryan

65.3k14 gold badges103 silver badges151 bronze badges

1 Comment

slim Over a year ago

OP claims "data structures that are not practical for database storage" -- it's almost certainly not true, but should probably be addressed in this answer.

Collectives™ on Stack Overflow

How to create memory efficient data structures in Java

4 Answers 4

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related