0

I had a problem with a Spring Batch job for reading a large CSV file (a few million records) and saving the records from it to a database. The job uses FlatFileItemReader for reading the CSV and JpaItemWriter for writing read and processed records to the database. The problem is that JpaItemWriter doesn't clear the persistence context after flushing another chunk of items to the database and the job ends up with OutOfMemoryError.

I have solved the problem by extending JpaItemWriter and overriding the write method so that it calls EntityManager.clear() after writing a bunch, but I was wondering whether Spring Batch addresses this issue already and the root of the problem is in the job config. How to address this issue the right way?

My solution:

class ClearingJpaItemWriter<T> extends JpaItemWriter<T> {

        private EntityManagerFactory entityManagerFactory;

        @Override
        public void write(List<? extends T> items) {
            super.write(items);
            EntityManager entityManager = EntityManagerFactoryUtils.getTransactionalEntityManager(entityManagerFactory);

            if (entityManager == null) {
                throw new DataAccessResourceFailureException("Unable to obtain a transactional EntityManager");
            }

            entityManager.clear();
        }

        @Override
        public void setEntityManagerFactory(EntityManagerFactory entityManagerFactory) {
            super.setEntityManagerFactory(entityManagerFactory);
            this.entityManagerFactory = entityManagerFactory;
        }
    }

You can see the added entityManager.clear(); in the write method.

Job config:

@Bean
public JpaItemWriter postgresWriter() {
    JpaItemWriter writer = new ClearingJpaItemWriter();
    writer.setEntityManagerFactory(pgEntityManagerFactory);
    return writer;
}

@Bean
    public Step appontmentInitStep(JpaItemWriter<Appointment> writer, FlatFileItemReader<Appointment> reader) {
        return stepBuilderFactory.get("initEclinicAppointments")
                .transactionManager(platformTransactionManager)
                .<Appointment, Appointment>chunk(5000)
                .reader(reader)
                .writer(writer)
                .faultTolerant()
                .skipLimit(1000)
                .skip(FlatFileParseException.class)
                .build();
    }

@Bean
    public Job appointmentInitJob(@Qualifier("initEclinicAppointments") Step step) {
        return jobBuilderFactory.get(JOB_NAME)
                .incrementer(new RunIdIncrementer())
                .preventRestart()
                .start(step)
                .build();
    }
3
  • If you are sure about EM issue maybe an approch using ChunkListener#afterChunk or ItemWriteListener#afterWrite is less intrusive than your solution. Checking jpa-writer code a EntityManager.flush is performed after every write, so issue should not happens. Did you try with different (small) chunk-size/skip-limit? Commented Feb 18, 2019 at 16:06
  • @LucaBassoRicci I might be wrong but flush doesn't clear the context. The listeners indeed look better than my solution, I just didn't know the API well. The skip limit of 1000 which I used is an appropriate percent of "bad" records in the CSV before the job fails, and the chunk size of 5000 is half as much smaller than the original 10k chunk. The answer here stackoverflow.com/questions/13886608/… says that EM.clear must be called when doing batch processing, so maybe the the listeners is the place to make a call to EM.clear when dealing with large files Commented Feb 19, 2019 at 9:25
  • I created jira.spring.io/browse/BATCH-2797 for this. Thanks for reporting it. Commented Feb 26, 2019 at 11:32

1 Answer 1

2

That's a valid point. The JpaItemWriter (and HibernateItemWriter) used to clear the persistent context but this has been removed in BATCH-1635 (Here is the commit that removed it). However, this has been re-added and made configurable in the HibernateItemWriter in BATCH-1759 through the clearSession parameter (See this commit) but not in the JpaItemWriter.

So I suggest to open an issue against Spring Batch to add the same option to the JpaItemWriter as well in order to clear the persistence context after writing items (This would be consistent with the HibernateItemWriter).

That's said, to answer your question, you can indeed use a custom writer to clear the persistence context as you did.

Hope this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

I created jira.spring.io/browse/BATCH-2797 to add a new parameter in the JpaItemWriter.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.