1

I am exploring Spring batch and I have a problem statement which require me to read from db, transform this data into comma separated form and write to a file. I have around 50 different queries and as many number of files to create. Few of these queries return huge data which could make my file large. I was solving this with spring batch and have few queries in general about spring batch.

  1. Can a field extractor be used when I need to transform a particular field value.

BeanWrapperFieldExtractor<StudentDTO> extractor = new BeanWrapperFieldExtractor<>();
extractor.setNames(new String[] {"name", "emailAddress", "purchasedPackage"});
lineAggregator.setFieldExtractor(extractor);

for example, if i need to do something like studentDto.getName().replace("a",""). Should I go for a custom processor in such cases?

  1. Is 1 job with 50 steps and parallel processing an apt way to go about in this scenario?
  2. Writing header to the top of the file instead of using FlatFileHeaderCallback - Is the below way of writing to file acceptable?

@Override
public ExitStatus afterStep(StepExecution stepExecution) {
   if (stepExecution.getStatus() == "COMPLETED") {
   
      fileWriter.write("headerString");
      Path path = Paths.get("encryptedTextFileThreaded.txt");
      try (BufferedWriter fileWriter = Files.newBufferedWriter(path)) {
        for(Line line: studentDtoLines)
        {
          fileWriter.write(line.getLine());
          fileWriter.newLine();
        }
      
      fileWriter.write("footerString");
  }
  catch (Exception e) {
      log.error("Fatal error: error occurred while writing {} file",path.getFileName());
  }
}
    
   

  1. Multi threaded steps are for speeding up a single step. If I have a Job with 50 steps and none of them steps depends on the other, then parallel processing can be employed to speed up the execution of Job. True? Does this mean spring batch can create 50 threads and run all of them in paralle?
1

1 Answer 1

1
  1. Can a field extractor be used when I need to transform a particular field value. Should I go for a custom processor in such cases?

I would use a processor for data transformation. That's a typical use case for an item processor. It is a good practice to make each component do one thing (and do it well): the field extractor to extract fields and an item processor to do the transformation. This is better for testing and reusability.

  1. Is 1 job with 50 steps and parallel processing an apt way to go about in this scenario?

IMO a job for each file is a better choice for restartability reasons. When a file processing fails, it is better (and cleaner) to restart the failed job for that specific file rather than the same job and skip 49 steps. You can always run multiple jobs in parallel by using an appropriate task executor on the JobLauncher.

  1. Writing header to the top of the file instead of using FlatFileHeaderCallback - Is the below way of writing to file acceptable?

No, that's a wrong usage of a listener. I would use a header/footer callback for header/footer writing and a chunk oriented step to write the content of the file.

  1. Multi threaded steps are for speeding up a single step. If I have a Job with 50 steps and none of them steps depends on the other, then parallel processing can be employed to speed up the execution of Job. True? Does this mean spring batch can create 50 threads and run all of them in paralle?

That's correct. The degree of parallelism is configurable in the TaskExecutor you set on the parallel flow. See Parallel steps for more details.

Sign up to request clarification or add additional context in comments.

2 Comments

If I make independent Jobs for all the files how can i get data from each Job and consolidate it at the end. For example if i need to create a 51st file with details like the name of all the 50 i created and with number of records in them or the hash value of the corresponding file? Btw Delighted to get the answer from you. I saw the session on "High perfomance batch processing"!
oh great hope you enjoyed the session! If you want to aggregate results, you can do it in a separate job (that reads the generated files). Another interesting concept you can try is to use a job of jobs thanks to JobStep and do the aggregation is a regular step at the end of the "master" job. Hope this helps.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.