Spring Batch - Read from DB - Transform - And write to file

Question

I am exploring Spring batch and I have a problem statement which require me to read from db, transform this data into comma separated form and write to a file. I have around 50 different queries and as many number of files to create. Few of these queries return huge data which could make my file large. I was solving this with spring batch and have few queries in general about spring batch.

Can a field extractor be used when I need to transform a particular field value.

BeanWrapperFieldExtractor<StudentDTO> extractor = new BeanWrapperFieldExtractor<>();
extractor.setNames(new String[] {"name", "emailAddress", "purchasedPackage"});
lineAggregator.setFieldExtractor(extractor);

for example, if i need to do something like studentDto.getName().replace("a",""). Should I go for a custom processor in such cases?

Is 1 job with 50 steps and parallel processing an apt way to go about in this scenario?
Writing header to the top of the file instead of using FlatFileHeaderCallback - Is the below way of writing to file acceptable?

@Override
public ExitStatus afterStep(StepExecution stepExecution) {
   if (stepExecution.getStatus() == "COMPLETED") {
   
      fileWriter.write("headerString");
      Path path = Paths.get("encryptedTextFileThreaded.txt");
      try (BufferedWriter fileWriter = Files.newBufferedWriter(path)) {
        for(Line line: studentDtoLines)
        {
          fileWriter.write(line.getLine());
          fileWriter.newLine();
        }
      
      fileWriter.write("footerString");
  }
  catch (Exception e) {
      log.error("Fatal error: error occurred while writing {} file",path.getFileName());
  }
}

Multi threaded steps are for speeding up a single step. If I have a Job with 50 steps and none of them steps depends on the other, then parallel processing can be employed to speed up the execution of Job. True? Does this mean spring batch can create 50 threads and run all of them in paralle?

take a look to docs.spring.io/spring-batch/trunk/reference/html/… — ZaoTaoBao
– ZaoTaoBao, Commented Sep 12, 2019 at 8:06

Mahmoud Ben Hassine · Accepted Answer · 2019-09-12 08:41:11Z

1

Can a field extractor be used when I need to transform a particular field value. Should I go for a custom processor in such cases?

I would use a processor for data transformation. That's a typical use case for an item processor. It is a good practice to make each component do one thing (and do it well): the field extractor to extract fields and an item processor to do the transformation. This is better for testing and reusability.

Is 1 job with 50 steps and parallel processing an apt way to go about in this scenario?

IMO a job for each file is a better choice for restartability reasons. When a file processing fails, it is better (and cleaner) to restart the failed job for that specific file rather than the same job and skip 49 steps. You can always run multiple jobs in parallel by using an appropriate task executor on the JobLauncher.

Writing header to the top of the file instead of using FlatFileHeaderCallback - Is the below way of writing to file acceptable?

No, that's a wrong usage of a listener. I would use a header/footer callback for header/footer writing and a chunk oriented step to write the content of the file.

Multi threaded steps are for speeding up a single step. If I have a Job with 50 steps and none of them steps depends on the other, then parallel processing can be employed to speed up the execution of Job. True? Does this mean spring batch can create 50 threads and run all of them in paralle?

That's correct. The degree of parallelism is configurable in the TaskExecutor you set on the parallel flow. See Parallel steps for more details.

answered Sep 12, 2019 at 8:41

Mahmoud Ben Hassine

32.1k5 gold badges38 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Akhil Kooliyatt Over a year ago

If I make independent Jobs for all the files how can i get data from each Job and consolidate it at the end. For example if i need to create a 51st file with details like the name of all the 50 i created and with number of records in them or the hash value of the corresponding file? Btw Delighted to get the answer from you. I saw the session on "High perfomance batch processing"!

Mahmoud Ben Hassine Over a year ago

oh great hope you enjoyed the session! If you want to aggregate results, you can do it in a separate job (that reads the generated files). Another interesting concept you can try is to use a job of jobs thanks to JobStep and do the aggregation is a regular step at the end of the "master" job. Hope this helps.

Collectives™ on Stack Overflow

Spring Batch - Read from DB - Transform - And write to file

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related