1

I am trying to read a very large CSV file containing more than 1M rows using a FlatFileItemReader but when launching my batch job I get an OutOfMemoryException after about 10min.

Here is my code :

@Slf4j
@Configuration
@EnableBatchProcessing
@ComponentScan({
        "f.p.f.batch",
        "f.p.f.batch.tasklet"
})
public class BatchConfig {

@Autowired
private StepBuilderFactory steps;

@Autowired
private JobBuilderFactory jobBuilderFactory;

@Autowired
private DemoTasklet demoTasklet;

@Bean
public ResourcelessTransactionManager transactionManager() {
    return new ResourcelessTransactionManager();
}

@Bean
public JobRepository jobRepository(ResourcelessTransactionManager transactionManager) {
    MapJobRepositoryFactoryBean mapJobRepositoryFactoryBean = new MapJobRepositoryFactoryBean(transactionManager);
    mapJobRepositoryFactoryBean.setTransactionManager(transactionManager);
    try {
        return mapJobRepositoryFactoryBean.getObject();
    } catch (Exception ex) {
        log.error("Exception : {}", ex.getMessage(), ex);
        return null;
    }
}


@Bean
//@StepScope
public FlatFileItemReader<Balance> csvAnimeReader() {
    FlatFileItemReader<Balance> reader = new FlatFileItemReader<>();
    DefaultLineMapper lineMapper = new DefaultLineMapper();
    FieldSetMapper fieldSetMapper = new BalanceFieldSetMapper();
    DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
    tokenizer.setNames(new String[]{
            "EXER",
            "IDENT",
            "NDEPT",
            "LBUDG",
            "INSEE",
            "SIREN",
            "CREGI",
            "NOMEN",
            "CTYPE",
            "CSTYP",
            "CACTI",
            "FINESS",
            "SECTEUR",
            "CBUDG",
            "CODBUD1",
            "COMPTE ",
            "BEDEB",
            "BECRE",
            "OBNETDEB",
            "OBNETCRE",
            "ONBDEB",
            "ONBCRE",
            "OOBDEB",
            "OOBCRE",
            "SD",
            "SC"
    });
    tokenizer.setDelimiter(";");

    lineMapper.setLineTokenizer(tokenizer);
    lineMapper.setFieldSetMapper(fieldSetMapper);
    reader.setLineMapper(lineMapper);
    reader.setResource(new ClassPathResource("Balance_Exemple_2016.csv"));
    reader.setLinesToSkip(1);
    return reader;
}


@Bean
public ItemProcessor<Balance, Balance> CsvFileProcessor() {
    return new BalanceProcessor();
}

@Bean
public BalanceWriter balanceWriter() {
    return new BalanceWriter();
}

@Bean
public SimpleJobLauncher jobLauncher(JobRepository jobRepository) {
    SimpleJobLauncher simpleJobLauncher = new SimpleJobLauncher();
    simpleJobLauncher.setJobRepository(jobRepository);
    return simpleJobLauncher;
}

@Bean
public Step step1() {
    return steps.get("step1")
            .<Balance, Balance>chunk(1)
            .reader(csvAnimeReader())
            .writer(balanceWriter())
            .build();
}

@Bean
public Step step2() {
    return steps.get("step2")
            .tasklet(demoTasklet)
            .build();
}

@Bean
public Job readCsvJob() {
    return jobBuilderFactory.get("readCsvJob")
            .incrementer(new RunIdIncrementer())
            .flow(step1())
            .next(step2())
            .end()
            .build();
}

}

0

2 Answers 2

1

I suggest you to use streaming, since you never want to read all your file at once, which is a major problem.

here is a nice article how to read file more efficiently without holding you entire memory space

Sign up to request clarification or add additional context in comments.

1 Comment

Hello, yes it was the right solution that I used to handle that big file ;) Thanks
0

I suggest you tu increase your JVM max memory by default its quite low (far under 900Mo) to increase VM args add the vm args parameter -Xmx4g to have 4Go as max jvm memory

you can found all the documentation on default value xmx here https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/jrdocs/refman/optionX.html

if you run it in cmd line java -jar myprog.jat --vmargs-Xmx4g

if tou are in eclipse Run ->Run Configuration -> select the one you use then in tab "Arguments" add -Xmx4G in vmargs text area

8 Comments

Arnault, still stuck in the same location and give same exception. When I execute it on another computer with more RAM it starts reading but to read the whole file it takes almost 1 hour ! I am wondering if Spring Batch doesn't have his way to treat such kind of big files using threads or splitting the file
Can you provide project with input ?
Its size is more than 900Mo it will take sometime to upload :(
If i can't reproduce your issue I couldn't t help you
Found a solution to upload it fast :) Here is the link on WeTransfert : wetransfer.com/downloads/…
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.