2

I have the following problem. I have 10 Threads which create objects that are inserted in the database. Each Thread has a ThreadLocal and its own session. All objects are inserted together, after they were created. These objects have a column which is marked as unique. However, I have the problem, that it can happen that two different threads create the same object. This behaviour is wanted but I don't know how I can insert them into my database.

Currently, each thread queries all objects that are inserted in the database, checks on the queried objects if they exist or not and inserts the non-existing objects into the database. However, as it can happen that the object did not exist on the query of all objects, I get a ConstraintViolationException when I insert the objects and they were already added by another Thread. However, doing a database (or cache) query for each object has to bad performance, as we are trying to add 1000 objects per thread and minute. If I try to flush the database after each single insert, then I get the following error: Deadlock found when trying to get lock; try restarting transaction

So my question is: How can I insert objects, that have a unique constraint from different threads simultanously.

//Edit: currently I'm using Hibernate with MYSQL InnoDB

//Edit2: Finally, the code which I use to write a single item.

public class ItemWriterRunnable implements Callable<Object> {

    private final ThreadLocal<Session> session = new ThreadLocal<Session>();

    private Item item;

    public ItemWriterRunnable(Item item) {
        super();
        this.item= item;
    }

    protected Session currentSession() {
        Session s = this.session.get();
        // Open a new Session, if this thread has none yet
        if (s == null || !s.isOpen()) {
            s = HibernateUtils.getSessionFactory().openSession();
            // Store it in the ThreadLocal variable
            this.session.set(s);
        }
        return s;
    }

    @Override
    public Object call() throws Exception {
        Session currentSession = currentSession();
        try {
            currentSession.beginTransaction();
            currentSession.save(this.item);
            currentSession.getTransaction().commit();
        } catch (ConstraintViolationException e) {
            currentSession.getTransaction().rollback();
        } catch (RuntimeException e) {
            currentSession.getTransaction().rollback();
        } finally {
            currentSession.close();
            currentSession = null;
            this.session.remove();
        }
        return null;
    }
}

Best regards, André

10
  • You could spawn a thread per object, and start a transaction within that thread to write the object, and simply ignore any errors caused by constraint violations. Commented Jul 15, 2014 at 20:07
  • Would this also allow multi-row inserts? (e.g., insert 10 objects as a bulk? One Transaction per object sounds like big overhead or are I am wrong? Commented Jul 15, 2014 at 20:28
  • 3
    No, one object per insert. You have no choice -- you cannot do a check/write without a semaphore to lock your database and block all threads on that lock from doing any work at all, and that will only work in a single jvm -- if you distribute your transactions to a cluster, for example, you're screwed. If you have multiple objects per write, they're all going to fail if one fails. One object per write means that only one fails. The thread overhead isn't as large as your locking overhead when using a check-then-write pattern. Commented Jul 15, 2014 at 20:31
  • OK thank you. I will give it a try tomorrow ;) Commented Jul 15, 2014 at 20:38
  • Finally, I implemented it the way you described and the concurrency is really handled correctly. Thank you. However, I'm still struggling with the performance. I use a ExecutorService which spawns 25 Threads. Each thread gets a callable, starts a session, starts a transaction, saves the item, commits the transaction and closses the transaction. (I added the code above). Is there a way to increase performance? Commented Jul 16, 2014 at 15:35

1 Answer 1

1

If you write multiple objects in a thread, and one of them fails because it's a duplicate, then you'll have to work out which one was the duplicate, remove it from the set, and retry writing it to the DB (with a change of another failure). This takes a lot of time. Alternatively, you could read the DB to see if there are any duplicates before writing the set, and remove the duplicates before writing. This read/check/write pattern is flawed if it is not contained within a synchronised block, because other threads could write duplicates between the steps. The synchronisation needed to fix this will stall your server on every write, pausing all existing threads, potentially harming performance.

Instead, spawn a thread per object, and write the object within this thread (without the read/check). Most objects will write without issue, because most objects are not duplicated (an assumption, but it's probably right). Objects that are duplicates will fail with an exception, at which point you can terminate that thread because the relevant work is already done.

Sign up to request clarification or add additional context in comments.

4 Comments

This is a classic case for using an executor instead of spawning new threads, dispatching each write as a job and using as many threads as connections are available in the pool.
I agree (and have mentioned as much in a comment to the OP's question), but that would be an optimisation and would overly complicate this answer. My approach is to provide a simple solution, that can be refactored based on profiler feedback.
Manually spawning a thread-per-request is never a good solution, and using an executor to dispatch single jobs is a design issue rather than an optimization issue.
You're simply wrong @chrylis. Executor is a better design, but that doesn't mean that doing it the simple and easy way isn't a good solution -- have you ever heard of evolutionary design? Refactoring perhaps? You start with a simple solution, then refactor up to a good solution, which keeping your tests working as you go along. Another good use for a simple solution is to demonstrate a basic principle without having the overhead of explaining a more complicated way of achieving the same thing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.