2

I’m new to Spring Batch and trying to get some guidance for below requirement.

Overall Requirement:

I’ve to get data from different systems, apply some business logic, save the result in DB.

Below is an example.

I need to read data from 3 CSV files. First file – person.csv – contains name and id Second File – address.csv – contains address info for each person. One person can have zero or multiple addresses.
Third File – employment.csv – contains employment info for each person. One person can have zero or multiple employers.

Here is some sample.

Person.csv### (total size is 8 millions)

"personID", "personName"

1, Joey

2, Chandler

3, Ross

4, Monica

Address.csv

"personID", "addressType", "state"

1, residence, NY

1, mailing, NC

2, residence, NY

4, residence, NY

4, mailing, DC

Employment.csv

"personID", "employerName"

1, emp1

2, emp2

2, emp3

3, emp4

Note: each file is sorted by person id.

To apply the business logic, I need to merge the data for each person, i.e, I need to merge person, address, employment data for one person to apply the logic. Can you suggest any approach for this.

2

1 Answer 1

2

It sounds like a 4 step, job. You'll have to decide where the intermediate results of steps 1 to 3 should reside.

If the data from all the CSV files will fit in memory, then the intermediate results of steps 1 to 3 could just be a Map, with personID as the key. If not, then the intermediate results of steps 1 to 3 should probably be written to a temp table in the database.

Assuming all data will fit in memory, create a bean which can be injected into the ItemWriters of steps 1 to 3, for example:

// in a config class...
// assuming PersonID is of type Long
// Assuming Person class has appropriate attributes
Map<Long, Person> people = new HashMap<>();

Step 1:

  • ItemReader - reads the next Person.CSV row and creates a Person instance
  • ItemProcessor - nothing to do - pass the Person instance to the ItemWriter
  • ItemWriter - adds the Person instance to the people Map (or intermediate table).

Step 2:

  • ItemReader - reads the next Address.CSV row and creates an Address instance
  • ItemProcessor - nothing to do - pass the Address instance to the ItemWriter
  • ItemWriter - adds the Address to the related Person from the people Map (or intermediate table). TODO: what should happen if there is an Address for a person that does not exist?

Step 3:

  • ItemReader - reads the next Employment.CSV row and creates an Employment instance
  • ItemProcessor - nothing to do - pass the Employment instance to the ItemWriter
  • ItemWriter - adds the Employment to the related Person from the people Map (or intermediate table). TODO: what should happen if there is an Employment for a person that does not exist?

Since there is nothing for ItemProcessor to do in steps 1 to 3, it might be better to use a Tasklet.

Also, steps 1 to 3 could be done in parallel. It would probably increase performance, but there would be added complexity to ensure people is correctly populated.

Step 4:

  • ItemReader - reads the next element of people (or composite object from intermediate tables)
  • ItemProcessor - apply business logic
  • ItemWriter - write result to database
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Andrew for your input.
Thanks Andrew for your input. The payload size for my batch is very high. I'll have about 8 million persons and don't have any staging table. So, I won't be able to handle in memory. Since, no staging table is there, i believe the only option i have is to go with intermediate csv/xml/json. which will increase the processing time.
Use the database to store the intermediate results of steps 1 to 3. See the readers/writers which make it painless to handle interactions with the common database platforms.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.