1

I have to synchronize data from a file (Excel) to a database (MySQL) using Spring Batch.

The file will be processed record by record. Adding and updating database records works fine but I wonder how to detect and delete entries from the database that were removed from the file?

I consider to implement this:

  • read the file record-by-record
  • create or update the record in the database and remember the primary key
  • remove all records with different primary keys (final step after all records have been processed)

Do you know how to collect and pass all processed primary keys to a final step? Or do you recommend another implementation?

Thanks, Patrick

Update: I'm not allowed to alter the database tables.

1 Answer 1

1

Use a column to mark updated/added records.
After main step create a new one where you delete record not marked.

If DB schema modification is not an option:

Step 1. Dump primary keys from DB to CSV (original.csv)
Step 2. Create/update DB and store primary keys of updated data to CSV (updated.CSV)
After step 2. Create a differential file: original minus updated (diff.CSV)
Step 3. Read diff.CSV and delete records by PK

Sign up to request clarification or add additional context in comments.

3 Comments

Hi Luca, thanks for your response. But your solution is not an option. I'm not allowed to alter the schema. I'm sure there is some interceptor or another possibility to collect all processed items but I'm unable to find it.
Store data in memory can lead to OutOfMemoryError. Check my edit if it can help
Storing the PKs in memory wouldn't be a problem. We have another similar batch process (implemented without Spring Batch) that computes a diff between two collections with about 500.000 objects per collection - all in memory. The difference to the solution discussed here is, that creating and updating db records is also based on the diff we generate. When using a temporary CSV file we would have to synchronize the file access. I'm sure there must be a better (in-memory) solution. Otherwise I have to process all data at once.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.