Spring Batch OutOfMemoryException when reading a csv file with 1M rows and 900Mo size

Question

I am trying to read a very large CSV file containing more than 1M rows using a FlatFileItemReader but when launching my batch job I get an OutOfMemoryException after about 10min.

Here is my code :

@Slf4j
@Configuration
@EnableBatchProcessing
@ComponentScan({
        "f.p.f.batch",
        "f.p.f.batch.tasklet"
})
public class BatchConfig {

@Autowired
private StepBuilderFactory steps;

@Autowired
private JobBuilderFactory jobBuilderFactory;

@Autowired
private DemoTasklet demoTasklet;

@Bean
public ResourcelessTransactionManager transactionManager() {
    return new ResourcelessTransactionManager();
}

@Bean
public JobRepository jobRepository(ResourcelessTransactionManager transactionManager) {
    MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean = new MapJobRepositoryFactoryBean(transactionManager);
    mapJobRepositoryFactoryBean.setTransactionManager(transactionManager);
    try {
        return mapJobRepositoryFactoryBean.getObject();
    } catch (Exception ex) {
        log.error("Exception : {}", ex.getMessage(), ex);
        return null;
    }
}


@Bean
//@StepScope
public FlatFileItemReader<Balance> csvAnimeReader() {
    FlatFileItemReader<Balance> reader = new FlatFileItemReader<>();
    DefaultLineMapper lineMapper = new DefaultLineMapper();
    FieldSetMapper fieldSetMapper = new BalanceFieldSetMapper();
    DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
    tokenizer.setNames(new String[]{
            "EXER",
            "IDENT",
            "NDEPT",
            "LBUDG",
            "INSEE",
            "SIREN",
            "CREGI",
            "NOMEN",
            "CTYPE",
            "CSTYP",
            "CACTI",
            "FINESS",
            "SECTEUR",
            "CBUDG",
            "CODBUD1",
            "COMPTE ",
            "BEDEB",
            "BECRE",
            "OBNETDEB",
            "OBNETCRE",
            "ONBDEB",
            "ONBCRE",
            "OOBDEB",
            "OOBCRE",
            "SD",
            "SC"
    });
    tokenizer.setDelimiter(";");

    lineMapper.setLineTokenizer(tokenizer);
    lineMapper.setFieldSetMapper(fieldSetMapper);
    reader.setLineMapper(lineMapper);
    reader.setResource(new ClassPathResource("Balance_Exemple_2016.csv"));
    reader.setLinesToSkip(1);
    return reader;
}


@Bean
public ItemProcessor<Balance, Balance> CsvFileProcessor() {
    return new BalanceProcessor();
}

@Bean
public BalanceWriter balanceWriter() {
    return new BalanceWriter();
}

@Bean
public SimpleJobLauncher jobLauncher(JobRepository jobRepository) {
    SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
    simpleJobLauncher.setJobRepository(jobRepository);
    return simpleJobLauncher;
}

@Bean
public Step step1() {
    return steps.get("step1")
            .<Balance, Balance>chunk(1)
            .reader(csvAnimeReader())
            .writer(balanceWriter())
            .build();
}

@Bean
public Step step2() {
    return steps.get("step2")
            .tasklet(demoTasklet)
            .build();
}

@Bean
public Job readCsvJob() {
    return jobBuilderFactory.get("readCsvJob")
            .incrementer(new RunIdIncrementer())
            .flow(step1())
            .next(step2())
            .end()
            .build();
}

}

Vivek Swansi · Accepted Answer · 2020-12-20 10:40:22Z

1

I suggest you to use streaming, since you never want to read all your file at once, which is a major problem.

here is a nice article how to read file more efficiently without holding you entire memory space

answered Dec 20, 2020 at 10:40

Vivek Swansi

4191 gold badge4 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ghassen Over a year ago

Hello, yes it was the right solution that I used to handle that big file ;) Thanks

Arnault Le Prévost-Corvellec · Accepted Answer · 2018-06-28 14:55:59Z

0

I suggest you tu increase your JVM max memory by default its quite low (far under 900Mo) to increase VM args add the vm args parameter -Xmx4g to have 4Go as max jvm memory

you can found all the documentation on default value xmx here https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/jrdocs/refman/optionX.html

if you run it in cmd line java -jar myprog.jat --vmargs-Xmx4g

if tou are in eclipse Run ->Run Configuration -> select the one you use then in tab "Arguments" add -Xmx4G in vmargs text area

answered Jun 28, 2018 at 14:55

Arnault Le Prévost-Corvellec

5252 silver badges9 bronze badges

8 Comments

Ghassen Over a year ago

Arnault, still stuck in the same location and give same exception. When I execute it on another computer with more RAM it starts reading but to read the whole file it takes almost 1 hour ! I am wondering if Spring Batch doesn't have his way to treat such kind of big files using threads or splitting the file

Arnault Le Prévost-Corvellec Over a year ago

Can you provide project with input ?

Ghassen Over a year ago

Its size is more than 900Mo it will take sometime to upload :(

Arnault Le Prévost-Corvellec Over a year ago

If i can't reproduce your issue I couldn't t help you

Ghassen Over a year ago

Found a solution to upload it fast :) Here is the link on WeTransfert : wetransfer.com/downloads/…

|

Collectives™ on Stack Overflow

Spring Batch OutOfMemoryException when reading a csv file with 1M rows and 900Mo size

2 Answers 2

1 Comment

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related