Fastest way to serialize relatively simple Java POJOs?

Question

I need to write millions of Java POJOs to disk, and read them from disk, and I need to do it fast.

I would prefer to avoid having to define a separate template file as I believe is required with Thrift and Google Protocol Buffers. Rather, it would be preferable if the Java class itself was the authoritative specification for the object (as with Java Serialization, Gson, and other serialization protocols). I realize that there may be a bit of a performance hit here, but its ok provided its not an order of magnitude slower.

The classes to be serialized consist of several simple long and String fields, and a single Map (where the values in this map are all either Numbers or Strings).

Can anyone suggest some libraries that I should look at for this?

Have you measured native Java serialization and saw that it wasn't fast enough? What's the time you had, and what's the time you want? — JB Nizet
– JB Nizet, Commented Sep 14, 2011 at 18:38
There isn't really a threshold above which its good and below which is bad. Faster is better. Native serialization may be fine, I'm just wondering whether there are some commonly understood faster approaches. — sanity
– sanity, Commented Sep 14, 2011 at 20:02
Re your "it would be preferable..." - I have a .NET version of protobuf that would work that way (code-first), but not Java; mentioned in case it applies to some later reader (see: protobuf-net) — Marc Gravell
– Marc Gravell, Commented Sep 14, 2011 at 20:42

KarlP · Accepted Answer · 2011-09-16 09:16:30Z

4

Test first with Java serialization, and see if it's fast enough. It's built in, and is competent enough to handle graphs and multiple versions.

There is no reason to look for alternatives until you know you need it.

Edit: You will need to reset() the ObjectStream, in order to not fill the lookup table with references to already written objects. If you are writing relatively independent objects, that is probably not a problem to do a reset after every "top" object, but if you have complex relations in your data, i suggest that you try JPA or something else.

edited Sep 16, 2011 at 9:16

answered Sep 14, 2011 at 18:42

KarlP

5,2212 gold badges30 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

billygoat Over a year ago

For a simple object. Native Serialization is good enough. +1 for simple direct answer.

Peter Lawrey Over a year ago

There are lots of faster approaches but the faster you go the more complex it gets for the developer. Your time is important too. ;)

KarlP Over a year ago

It's not blazingly fast: My laptop wrote 100000 data objects in 29,85300 seconds, each object contained a map with 10 strings, and 5 additional strings. Totally 1 500 000 objects or so. Reading is faster, it took 5 seconds to read everything back.

KarlP Over a year ago

SOrry, that would be 3 000 000 objects... The map contains 10 keys and 10 values... The file is about 230 MB, and thats about 73 bytes per String, which isn't much overhead actually.

Collectives™ on Stack Overflow

Fastest way to serialize relatively simple Java POJOs?

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related