1

Is Spring Batch a good fit for processing a a large number of individual files?

Spring Batch seems to be geared towards data-centric jobs. I've got a requirement to pull down several million files from an S3 bucket, unzip them, perform some logic based on the contents, then call a web service.

Implementing this by hand is trivial, but I don't much fancy re-inventing the wheel when it comes to tracking job executions, and how far a job got along before it failed. Spring Batch seems to be an ideal fit for this job-monitoring, but I'm not sure whether subverting it to do file processing is a step too far.

2
  • 1
    I think based on your description, You should look at Spring Integration static.springsource.org/spring-integration/reference/html/…, It has adapters for handling files and with web service gateways it would be a good fit for your use case. Commented May 15, 2012 at 18:03
  • Thanks for your reply. I have already implemented parts of the system in Spring Integration, but SI is best suited for events and doesn't offer the concept of tracking a run of a job, and retrying it if it failed. SI is great for monitoring file repositories and reacting to file events, but wouldn't be much use if I needed to process the entire contents of an S3 bucket as an ad-hoc job. Commented May 15, 2012 at 20:53

1 Answer 1

2

Short answer is Yes, you can use spring batch for this. I had done a small POC where we had to migrate millions of images from source system to target system in a batch process and it works well IMHO.

Adding on to comment by @Prasanna Talakanti, I would suggest to use a combination of Spring Integration and Spring Batch. While Spring batch will provide you infrastructure for batch processing (Commit at intervals, restart job if failed etc), Spring integration will provide you things around web service gateways.

In Spring batch, you can define reader for reading data from S3 and writer for writing to your destination with processor in between if needed. You could also fine tune the commit interval so if the job fails in between, you have a point of rollback.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.