java batch insertion method

Question

Running in an Spring-based application server context, an application is sometimes required to handle high rates of database inserts into one big table. The current implementation uses Spring Data with OpenJPA under the hood and is connected to an Amazon RDS (Postgresql 9.6) database. When it needs to persist something it just calls the Spring provided save method. After measuring the performance we found that it is able to write about 4000 records per second.

We built a dummy application to test several approaches and found that by doing 1000 inserts per batch using 4 connections in parallel yields the best performance at approximately 13500 records per second.

Now, we need to change application's code to buffer its persisted objects up to 1000 (or otherwise configured), run the batch insert procedure on these buffers when they are full or after some timeout period or upon server shutdown.

Anyone encountered such problem before? Any suggestions about threading issues, synchronization, data-structures?

Thanks in advance, Adrian.

Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. — Joe C
– Joe C, Commented Jul 30, 2017 at 10:01
@JoeC Sometimes we want opinionated answers. After getting ten of such answers we can do our own comparison and reach a decision. Now, we are stuck with a generic answer and a hint to use some mega monster library. That's not so useful. Marked the question as answered anyway... — Adrian Herscu
– Adrian Herscu, Commented Aug 1, 2017 at 15:06

glytching · Accepted Answer · 2017-07-31 10:25:35Z

It sounds like you need write behind e.g. queue each data change, let this queue be subject to a configurable duration (aka the “write behind delay”) and a maximum size. When data changes, it is added to the write-behind queue (if it is not already in the queue) and it is written to the underlying store whenever one of the following conditions is met:

The write behind delay expires
The queue exceeds a configurable size
The system enters shutdown mode and you want to ensure that no data is lost

If so, then there is plenty of prior art in this space. For example, Spring’s Cache Abstraction allows you to add a caching layer and it supports JSR-107 compliant caches such as Ehcache 3.x which provides a write behind cache writer. Spring’s caching service is an abstraction not an implementation, the idea being that it will look after the caching logic for you while you continue to provide the store and the code to interact with the store.

Re this specific part of your question:

Any suggestions about threading issues, synchronization, data-structures?

The caching abstraction and the chosen caching implementation (Ehcache, for example) will look after threading and synchronization and will provide you with levers such as queue size, concurrency level, batch size, maximum write delay etc to allow you to configure its behaviour. And since it will just wrap your existing code you won't need to change your existing data structures; just write your existing types to the cache and let the cache look after delegating to your existing store/repository implementation as/when it decides that a write-behind is necessary.

Collectives™ on Stack Overflow

java batch insertion method

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related