2

Background

I have a Spring Batch job where :

  1. FlatFileItemReader - Reads one row at a time from the file
  2. ItemProcesor - Transforms the row from the file into a List<MyObject> and returns the List. That is, each row in the file is broken down into a List<MyObject> (1 row in file transformed to many output rows).
  3. ItemWriter - Writes the List<MyObject> to a database table. (I used this implementation to unpack the list received from the processor and delegae to a JdbcBatchItemWriter)

Question

  • At point 2) The processor can return a List of 100000 MyObject instances.
  • At point 3), The delegate JdbcBatchItemWriter will end up writing the entire List with 100000 objects to the database.

My question is : The JdbcBatchItemWriter does not allow a custom batch size. For all practical purposes, the batch-size = commit-interval for the step. With this in mind, is there another implementation of an ItemWriter available in Spring Batch that allows writing to the database and allows configurable batch size? If not, how do go about writing a custom writer myself to acheive this?

3 Answers 3

2

I see no obvious way to set the batch size on the JdbcBatchItemWriter. However, you can extend the writer and use a custom BatchPreparedStatementSetter to specify the batch size. Here is a quick example:

public class MyCustomWriter<T> extends JdbcBatchItemWriter<T> {

    @Override
    public void write(List<? extends T> items) throws Exception {
        namedParameterJdbcTemplate.getJdbcOperations().batchUpdate("your sql", new BatchPreparedStatementSetter() {
            @Override
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                // set values on your sql
            }

            @Override
            public int getBatchSize() {
                return items.size(); // or any other value you want
            }
        });
    }

}

The StagingItemWriter in the samples is an example of how to use a custom BatchPreparedStatementSetter as well.

Sign up to request clarification or add additional context in comments.

7 Comments

@MohmoudBenHassine This looks promising. Regarding the StagingItemWriter example, instead of using a BatchPreparedStatementSetter, I could simply call getJdbcTemplate().batchUpdate in the write method passing it the required batch size?
Yes, that's another option.
I've got the necessary direction to go forward. Thanks.
glad it helped! However, this is tricky. There will be a single transaction for the whole chunk, no matter how many items will result from the processing logic. For example, If commit-size=10 and for each item the processor returns 100 items, then the writer will receive 1000 items per chunk. Those 1000 items will be committed in a single transaction, even if the jdbc batch_size = 500. The batch size governs how many queries will go to the db (2 batchUpdate queries in this case, 500 items in each batch) and not how many transactions (a single transaction driven by Spring Batch in this case).
There are no issues. The idea of chunk processing is to process the whole chunk as a unit (all or nothing semantics). Just wanted to clarify that the batch size you are trying to customize does not influence this model (it only influences how many queries will be sent to the db).
|
1

The answer from Mahmoud Ben Hassine and the comments pretty much covers all aspects of the solution and is the accepted answer.

Here is the implementation I used if anyone is interested :

public class JdbcCustomBatchSizeItemWriter<W> extends JdbcDaoSupport implements ItemWriter<W> {

    private int batchSize;
    private ParameterizedPreparedStatementSetter<W> preparedStatementSetter;
    private String sqlFileLocation;
    private String sql;

    public void initReader() {
        this.setSql(FileUtilties.getFileContent(sqlFileLocation));
    }

    public void write(List<? extends W> arg0) throws Exception {
        getJdbcTemplate().batchUpdate(sql, Collections.unmodifiableList(arg0), batchSize, preparedStatementSetter);
    }

    public void setBatchSize(int batchSize) {
        this.batchSize = batchSize;
    }

    public void setPreparedStatementSetter(ParameterizedPreparedStatementSetter<W> preparedStatementSetter) {
        this.preparedStatementSetter = preparedStatementSetter;
    }

    public void setSqlFileLocation(String sqlFileLocation) {
        this.sqlFileLocation = sqlFileLocation;
    }

    public void setSql(String sql) {
        this.sql = sql;
    }
}

Note :

  1. The use of Collections.unmodifiableList prevents the need for any explicit casting.
  2. I use sqlFileLocation to specify an external file that contains the sql and FileUtilities.getfileContents simply returns the contents of this sql file. This can be skipped and one can directly pass the sql to the class as well while creating the bean.

Comments

0

I wouldn't do this. It presents issues for restartability. Instead, modify your reader to produce individual items rather than having your processor take in an object and return a list.

2 Comments

The processing required to convert 1 row in the file to multiple rows is relatively complex to put in an ItemReader. If I don't need to restart jobs considering my jobs never need to be restarted, is there any other issue with this approach that you see?
Much time has passed, but did want to follow-up on this. It would not be that complicated to create a custom ItemReader to return individual items (multiple per row). You would just need to appropriately track your itemCount to preserve restartability. This is much cleaner than the alternative you proposed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.