Reading CSV data in Spring Batch (creating a custom LineMapper)

Question

I've been doing a bit of work writing some batch processing code on CSV data. I found a tutorial online and so far have been using it without really understanding how or why it works, which means I'm unable to solve a problem I'm currently facing.

The code I'm working with is below:

 @Bean
    public LineMapper<Employee> lineMapper() {
        DefaultLineMapper<Employee> lineMapper = new DefaultLineMapper<Employee>();
        DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
        lineTokenizer.setNames(new String[] { "id", "firstName", "lastName" });
        lineTokenizer.setIncludedFields(new int[] { 0, 1, 2 });
        BeanWrapperFieldSetMapper<Employee> fieldSetMapper = new BeanWrapperFieldSetMapper<Employee>();
        fieldSetMapper.setTargetType(Employee.class);
        lineMapper.setLineTokenizer(lineTokenizer);
        lineMapper.setFieldSetMapper(fieldSetMapper);
        return lineMapper;
    }

I'm not entirely clear on what setNames or setIncludedFields is really doing. I've looked through the docs, but still don't know what's happening under the hood. Why do we need to give names to the lineTokenizer? Why can't it just be told how many columns of data there will be? Is its only purpose so that the fieldSetMapper knows which fields to map to which data objects (do they all need to be named the same as the fields in the POJO?)?

I have a new problem where I have CSVs with a large amount of columns (about 25-35) that I need to process. Is there a way to generate the columns in setNames programmatically with the variable names of the POJOs, rather than editing them in by hand?

Edit:

An example input file may be something like:

test.csv:
field1, field2, field3,
a,b,c
d,e,f
g,h,j

The DTO:

public class Test {

    private String field1;
    private String field2;
    private String field3;

   //setters and getters and constructor

Mahmoud Ben Hassine · Accepted Answer · 2021-05-16 19:29:25Z

4

I see the confusion, so I will try to clarify how key interfaces work together. A LineMapper is responsible for mapping a single line from your input file to an instance of your domain type. The default implementation provided by Spring Batch is the DefaultLineMapper, which delegates the work to two collaborators:

LineTokenizer: which takes a String and tokenizes it into a FieldSet (which is similar to the ResultSet in the JDBC world, where you can get fields by index or name)
FieldSetMapper: which maps the FieldSet to an instance of your domain type

So the process is: String -> FieldSet -> Object:

Each interface comes with a default implementation, but you can provide your own if needed.

DelimitedLineTokenizer

The names attribute in DelimitedLineTokenizer is used to create named fields in the FieldSet. This allows you to get a field by name from the FieldSet (again, similar to ResultSet methods where you can get a field by name). The includedFields allows to select a subset of fields from your input file, just like in your use case where you have 25 fields and you only need to extract a subset of fields.

BeanWrapperFieldSetMapper

This FieldSetMapper implementation expects a type and uses the JavaBean naming conventions for getters/setters to set fields on the target object from the FieldSet.

Is there a way to generate the columns in setNames programmatically with the variable names of the POJOs, rather than editing them in by hand?

This is what the BeanWrapperFieldSetMapper will do. If you provide field names in the FieldSet, the mapper will call the setter of each field having the same name. The name matching is fuzzy in the sense that it tolerates close matches, here is an excerpt from the Javadoc:

Property name matching is "fuzzy" in the sense that it tolerates close matches,
as long as the match is unique. For instance:

* Quantity = quantity (field names can be capitalised)
* ISIN = isin (acronyms can be lower case bean property names, as per Java Beans recommendations)
* DuckPate = duckPate (capitalisation including camel casing)
* ITEM_ID = itemId (capitalisation and replacing word boundary with underscore)
* ORDER.CUSTOMER_ID = order.customerId (nested paths are recursively checked)

This mapper is also configurable with a custom ConversionService if needed. If this still does not cover your use case, you need to provide a custom mapper.

edited May 16, 2021 at 19:29

answered Feb 17, 2021 at 10:35

Mahmoud Ben Hassine

32.1k5 gold badges38 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Astrum Over a year ago

That clears it up quite a bit, the last question that remains is if there is an easier way to get the FieldSet names to match the variables of a POJO so I don't have to write out 25 of them by hand. I still am not sure how to do that by modifying the BeanWrapperSetMapper

Mahmoud Ben Hassine Over a year ago

Please edit your question and add an example of an input file and the target type and what you are expecting. I will edit the answer accordingly.

Astrum Over a year ago

Edited. Keep in mind again that the reason this is a problem is I will have many different CSVs to support, all with 20-30 fields each. Writing it all by hand would be painful. I suppose one solution is to just create a DTO factory that reads the first line and creates class variables somehow.

Mahmoud Ben Hassine Over a year ago

ok thanks. I think what you are looking for is something like: github.com/spring-projects/spring-batch/issues/1772 ? v2 used to have a feature that looks for fields from the header and assign them automatically to the mapper, but this has been removed. Is this what you are looking for?

Collectives™ on Stack Overflow

Reading CSV data in Spring Batch (creating a custom LineMapper)

1 Answer 1

DelimitedLineTokenizer

BeanWrapperFieldSetMapper

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

DelimitedLineTokenizer

BeanWrapperFieldSetMapper

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related