2

If I understood correctly, Java creates some overhead per class. If I want to create typical data structures such as linked lists, trees, tries, etc. the individual (list) items will be classes and therefore create a significant overhead as opposed to similar data structures in C. This becomes especially difficult for very large data sets. Is there some better way to implement those kinds of data structures in Java such that I wont have the overhead associated with storing classes in memory?

Here the memory consumption of Java objects is described. If I have millions of objects, the overhead by using objects might become too expensive. So I was wondering if there are better ways to approach such a situation.

10
  • In java colletions, items are objects of same type Commented May 23, 2017 at 5:28
  • 1
    java has a very well developed group of classes and interfaces for that... see java.util.collection, as you are aware, nothing will happen on a low level like C, since the virtual machine is in the middle ALWAYS Commented May 23, 2017 at 5:28
  • 1
    define significant overhead Commented May 23, 2017 at 5:30
  • Are you asking if there is a collection in Java that is implemented in C/C++/other-low-level-lang by standard (all VM implementation have it)? Or if there is a specific implementation you can use? Commented May 23, 2017 at 5:33
  • 2
    "the individual (list) items will be classes and therefore create a significant overhead as opposed to similar data structures in C" ... the class loading and object instantiating mechanisms in Java have been continuously developed for decades at this point and you can count on them to be fast and efficient. These facilities and the garbage collector will outperform most self-rolled object pools, for example (except for very expensive objects like database connections). You choose Java precisely because you want the benefits of object abstraction, otherwise you develop at the level of C. Commented May 23, 2017 at 7:23

4 Answers 4

1

You can implement these collections over chunks of bytes (obtained as new byte[...] or ByteBuffer.allocate[Direct](...) or unsafe.allocateMemory(...)). You can then manage this memory manually: pack/unpack your objects to and from byte chunks along with additional data (like indices of left and right values for a binary tree, index of next for a linked list etc.) This way you will not have to spend memory on object headers, extra references, alignment (although you might decide you do need to introduce your own alignment); can have your objects offheap; can have them mapped to a filesystem for persistence etc. However, it's not simple and incurs subtleties (e.g. you might start depending on malloc implementation and lose JVM heap optimizations; lose memory model guarantees; your objects might be split between cache lines; you will lose benefits of GC compaction etc.). I'm not saying any of these is a show-stopper, just that it's not all roses and you should understand what exactly you are gaining. If you have millions of objects, well, it's likely that the overhead is in 100s of megabytes. Make sure it's worth it to try to save them (compared to how much necessary data takes + compared to how large your heap is).

Sign up to request clarification or add additional context in comments.

Comments

0

You always can use the c++ native code inside Java (JNI) to increase the performance and the level of control (I don't think you really need this and I'm not sure that you can surpass standard java code).

Comments

0

A quick Google search on "c++ library jni" turned up this article, entitled Wrapping a C++ library with JNI – introduction that might prove interesting. I didn't read it, so I make no recommendation or guarantee as to the contents.

Comments

0

If you have datasets where java's object size overhead is a practical issue, I'd suggest consider using a database. You can start with an in-memory, embedded database, like sqlite, h2, or redis.

As your data gets larger, you'll need more complex management. Updating cross references, indexes, and the like manually to ensure that your data can be efficiently queried is a huge effort that a database can help with.

Using a proper database also allows you to grow the data size further when your data starts reaching hundreds of gigabytes level where it no longer fits on memory, and when you have to transition to actually start using the disk, or even when you reach multi terabytes level when you have to use multiple machine(s) to hold the data, without major rewrites.

A proper database can grow together with your application, a bunch of objects in memory can't.

1 Comment

OP claims "data structures that are not practical for database storage" -- it's almost certainly not true, but should probably be addressed in this answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.