1

Running in an Spring-based application server context, an application is sometimes required to handle high rates of database inserts into one big table. The current implementation uses Spring Data with OpenJPA under the hood and is connected to an Amazon RDS (Postgresql 9.6) database. When it needs to persist something it just calls the Spring provided save method. After measuring the performance we found that it is able to write about 4000 records per second.

We built a dummy application to test several approaches and found that by doing 1000 inserts per batch using 4 connections in parallel yields the best performance at approximately 13500 records per second.

Now, we need to change application's code to buffer its persisted objects up to 1000 (or otherwise configured), run the batch insert procedure on these buffers when they are full or after some timeout period or upon server shutdown.

Anyone encountered such problem before? Any suggestions about threading issues, synchronization, data-structures?

Thanks in advance, Adrian.

2
  • Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. Commented Jul 30, 2017 at 10:01
  • @JoeC Sometimes we want opinionated answers. After getting ten of such answers we can do our own comparison and reach a decision. Now, we are stuck with a generic answer and a hint to use some mega monster library. That's not so useful. Marked the question as answered anyway... Commented Aug 1, 2017 at 15:06

1 Answer 1

1

It sounds like you need write behind e.g. queue each data change, let this queue be subject to a configurable duration (aka the “write behind delay”) and a maximum size. When data changes, it is added to the write-behind queue (if it is not already in the queue) and it is written to the underlying store whenever one of the following conditions is met:

  • The write behind delay expires
  • The queue exceeds a configurable size
  • The system enters shutdown mode and you want to ensure that no data is lost

If so, then there is plenty of prior art in this space. For example, Spring’s Cache Abstraction allows you to add a caching layer and it supports JSR-107 compliant caches such as Ehcache 3.x which provides a write behind cache writer. Spring’s caching service is an abstraction not an implementation, the idea being that it will look after the caching logic for you while you continue to provide the store and the code to interact with the store.

Re this specific part of your question:

Any suggestions about threading issues, synchronization, data-structures?

The caching abstraction and the chosen caching implementation (Ehcache, for example) will look after threading and synchronization and will provide you with levers such as queue size, concurrency level, batch size, maximum write delay etc to allow you to configure its behaviour. And since it will just wrap your existing code you won't need to change your existing data structures; just write your existing types to the cache and let the cache look after delegating to your existing store/repository implementation as/when it decides that a write-behind is necessary.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.