4

I'm writing a program to run a batch process over hundreds of thousands of entities of a few related types. I was originally doing this with a single transaction per persist. This seemed very slow, so I tried doing somewhat naive batch updates in the way described in http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html, with longer transactions and occasional flush+clears. I'm running into a ConstraintViolationException for some of my entity types, because I have unique field constraints. However, I'm unsure of how to check for existing instances; I currently have a criteria to list collisions, but it seems to not return entities that I have saveOrUpdated within the same transaction.

A made-up example may help:
entities Family, Person, Name
Family has many Persons (One to Many)
Persons have many names, different Persons can have the same Name. (Many to Many)

My updates include persisting a Family along with its Persons and Names, but I'm not sure how to dedupe Names (may collide with existing Name in db or another Name in the same update batch). I could just keep track of new entities' unique constraint fields outside of hibernate, but I thought this is probably not necessary. Is there any built-in way of checking for duplicates both in the db and uncommitted changes? I saw Hibernate batch updates with constraintviolationexception, but I do not savor using exceptions in the normal codepath. Thanks, I appreciate any guidance.

1 Answer 1

2

Short answer: no. For batch operations, Hibernate doesn't keeps track of the generated ids, so, you'd have to go to the database for each Name, as you'd do a query based on the name, not on the ID, unless you are using some query cache (which would be tricky for your case, I suppose).

What I would suggest is to do this in a two-step (three?) process: first, batch-insert all Name objects. Then, load them all using Hibernate itself, storing them on a Map. Then, just persist the other data, linking the Name to the non-persisted Person. Of course, you'd need as much memory as you have names :-) But why are you keeping Name as a separate entity, anyway?

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the reply. Yeah, I guess one way or another I'll need to have a map outside of hibernate keyed on a tuple of the unique constraints per entity, at least per batch. I just meant Name to be an illustration, in my actual application/schema/entities the many-to-many relation is more meaningful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.