Read csv file using SpringBatch

1. Demand

The system reads the csv file from a fixed directory every day and prints it on the console.

2. Solutions

To solve the above requirements, there are many methods that can be used, here we choose to use Spring Batch to achieve.

3. Matters needing attention

1. Obtaining the file path

Simple processing here, read the date in JobParameters , then construct a file path, and put the file path into ExecutionContext . For simplicity here, the file path will be hard-coded in the program, but at the same time, the file path will also be stored in ExecutionContext , and in a specific Step ExecutionContext to get the path.

Notice:
ExecutionContext中存入的数据虽然在各个Step中都可以获取到， 不推荐 36166600f73c0262a5486e25f3dcf8bd---存入比较大到ExecutionContext , because the data of this object needs to be stored in the database.

2. If each Step obtains the value in the ExecutionContext

Add @StepScope annotation to the class
Obtained by @Value("#{jobExecutionContext['importPath']}")

eg:

 @Bean
@StepScope
public FlatFileItemReader<Person> readCsvItemReader(@Value("#{jobExecutionContext['importPath']}") String importPath) {
    // 读取数据
    return new FlatFileItemReaderBuilder<Person>()
            .name("read-csv-file")
            .resource(new ClassPathResource(importPath))
            .delimited().delimiter(",")
            .names("username", "age", "sex")
            .fieldSetMapper(new RecordFieldSetMapper<>(Person.class))
            .build();
}

Explanation: When the program is instantiated FlatFileItemReader , there is no jobExecutionContext at this time, then an error will be reported. If @StepScope is added, then there is no problem. @StepScope Indicates that this bean is not instantiated until it reaches the Step stage

3. Note on the use of FlatFileItemReader

When we use FlatFileItemReader to read our csv file, we need to return the type FlatFileItemReader instead of directly returning Reader must be open before it can be read ItemReader , otherwise the following error may occur- Reader must be open before it can be read

4. Implementation steps

1. Import dependencies and configure

1. Import dependencies

 <dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-batch</artifactId>
    </dependency>
</dependencies>

2. Initialize the SpringBatch database

 spring.datasource.username=root
spring.datasource.password=root@1993
spring.datasource.url=jdbc:mysql://127.0.0.1:3306/spring-batch?useUnicode=true&characterEncoding=utf8&autoReconnectForPools=true&useSSL=false
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver

# 程序启动时，默认不执行job
spring.batch.job.enabled=false
spring.batch.jdbc.initialize-schema=always
# 初始化spring-batch数据库脚本
spring.batch.jdbc.schema=classpath:org/springframework/batch/core/schema-mysql.sql

2. Build file read path

My idea here is to complete the acquisition of the file path in JobExecutionListener 166dbc08f4ed75489d5685d6a88a973d---, and put it into ExecutionContext , and then in each Step can get it to the value of the file path.

 /**
 * 在此监听器中，获取到具体的需要读取的文件路径，并保存到 ExecutionContext
 *
 * @author huan.fu
 * @date 2022/8/30 - 22:22
 */
@Slf4j
public class AssemblyReadCsvPathListener implements JobExecutionListener {
    @Override
    public void beforeJob(JobExecution jobExecution) {
        ExecutionContext executionContext = jobExecution.getExecutionContext();
        JobParameters jobParameters = jobExecution.getJobParameters();
        String importDate = jobParameters.getString("importDate");
        log.info("从 job parameter 中获取的 importDate 参数的值为:[{}]", importDate);
        String readCsvPath = "data/person.csv";
        log.info("根据日期组装需要读取的csv路径为:[{}],此处排除日期，直接写一个死的路径", readCsvPath);
        executionContext.putString("importPath", readCsvPath);
    }

    @Override
    public void afterJob(JobExecution jobExecution) {

    }
}

3. Build Tasklet, output file path

 @Slf4j
@Component
@StepScope
public class PrintImportFilePathTaskLet implements Tasklet {

    @Value("#{jobExecutionContext['importPath']}")
    private String importFilePath;

    @Value("#{jobParameters['importDate']}")
    private String importDate;

    @Override
    public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {

        log.info("从job parameter 中获取到的 importDate:[{}],从 jobExecutionContext 中获取的 importPath:[{}]",
                importDate, importFilePath);

        return RepeatStatus.FINISHED;
    }
}

It should be noted that the @StepScope annotation is added to this class

4. Write entity classes

 @AllArgsConstructor
@Getter
@ToString
public class Person {
    /**
     * 用户名
     */
    private String username;
    /**
     * 年龄
     */
    private Integer age;
    /**
     * 性别
     */
    private String sex;
}

5. Write Job configuration

 @Configuration
@AllArgsConstructor
@Slf4j
public class ImportPersonJobConfig {

    private final JobBuilderFactory jobBuilderFactory;
    private final StepBuilderFactory stepBuilderFactory;

    private final PrintImportFilePathTaskLet printImportFilePathTaskLet;
    private final ItemReader<Person> readCsvItemReader;

    @Bean
    public Job importPersonJob() {
        // 获取一个job builder, jobName可以是不存在的
        return jobBuilderFactory.get("import-person-job")
                // 添加job execution 监听器
                .listener(new AssemblyReadCsvPathListener())
                // 打印 job parameters 和 ExecutionContext 中的值
                .start(printParametersAndContextVariables())
                // 读取csv的数据并处理
                .next(handleCsvFileStep())
                .build();
    }

    /**
     * 读取数据
     * 注意：此处需要返回 FlatFileItemReader类型，而不要返回ItemReader
     * 否则可能报如下异常 Reader must be open before it can be read
     *
     * @param importPath 文件路径
     * @return reader
     */
    @Bean
    @StepScope
    public FlatFileItemReader<Person> readCsvItemReader(@Value("#{jobExecutionContext['importPath']}") String importPath) {
        // 读取数据
        return new FlatFileItemReaderBuilder<Person>()
                .name("read-csv-file")
                .resource(new ClassPathResource(importPath))
                .delimited().delimiter(",")
                .names("username", "age", "sex")
                .fieldSetMapper(new RecordFieldSetMapper<>(Person.class))
                .build();
    }

    @Bean
    public Step handleCsvFileStep() {

        // 每读取一条数据，交给这个处理
        ItemProcessor<Person, Person> processor = item -> {
            if (item.getAge() > 25) {
                log.info("用户[{}]的年龄:[{}>25]不处理", item.getUsername(), item.getAge());
                return null;
            }
            return item;
        };

        // 读取到了 chunk 大小的数据后，开始执行写入
        ItemWriter<Person> itemWriter = items -> {
            log.info("开始写入数据");
            for (Person item : items) {
                log.info("{}", item);
            }
        };

        return stepBuilderFactory.get("handle-csv-file")
                // 每读取2条数据，执行一次write，当每read一条数据后，都会执行process
                .<Person, Person>chunk(2)
                // 读取数据
                .reader(readCsvItemReader)
                // 读取一条数据就开始处理
                .processor(processor)
                // 当读取的数据的数量到达 chunk 时，调用该方法进行处理
                .writer(itemWriter)
                .build();
    }

    /**
     * 打印 job parameters 和 ExecutionContext 中的值
     * <p>
     * TaskletStep是一个非常简单的接口，仅有一个方法——execute。
     * TaskletStep会反复的调用这个方法直到获取一个RepeatStatus.FINISHED返回或者抛出一个异常。
     * 所有的Tasklet调用都会包装在一个事物中。
     *
     * @return Step
     */
    private Step printParametersAndContextVariables() {
        return stepBuilderFactory.get("print-context-params")
                .tasklet(printImportFilePathTaskLet)
                // 当job重启时，如果达到了3此，则该step不在执行
                .startLimit(3)
                // 当job重启时，如果该step的是已经处理完成即COMPLETED状态时，下方给false表示该step不在重启，即不在执行
                .allowStartIfComplete(false)
                // 添加 step 监听
                .listener(new CustomStepExecutionListener())
                .build();
    }
}

6. Write the Job startup class

 @Component
@Slf4j
public class StartImportPersonJob {

    @Autowired
    private Job importPersonJob;
    @Autowired
    private JobLauncher jobLauncher;

    @PostConstruct
    public void startJob() throws JobInstanceAlreadyCompleteException, JobExecutionAlreadyRunningException, JobParametersInvalidException, JobRestartException {
        JobParameters jobParameters = new JobParametersBuilder()
                .addString("importDate", LocalDate.of(2022, 08, 31).format(DateTimeFormatter.ofPattern("yyyyMMdd")))
                .toJobParameters();
        JobExecution execution = jobLauncher.run(importPersonJob, jobParameters);
        log.info("job invoked");
    }
}

7. Automatically configure SpringBatch

 @SpringBootApplication
@EnableBatchProcessing
public class SpringBatchReadCsvApplication {

    public static void main(String[] args) {
        SpringApplication.run(SpringBatchReadCsvApplication.class, args);
    }
}

Mainly @EnableBatchProcessing Notes

5. Execution results

执行结果

6. Complete code

https://gitee.com/huan1993/spring-cloud-parent/tree/master/spring-batch/spring-batch-read-csv

Read csv file using SpringBatch

1. Demand

2. Solutions

3. Matters needing attention

1. Obtaining the file path

2. If each Step obtains the value in the ExecutionContext

3. Note on the use of FlatFileItemReader

4. Implementation steps

1. Import dependencies and configure

1. Import dependencies

2. Initialize the SpringBatch database

2. Build file read path

3. Build Tasklet, output file path

4. Write entity classes

5. Write Job configuration

6. Write the Job startup class

7. Automatically configure SpringBatch

5. Execution results

6. Complete code

huan1993

引用和评论

Transaction rolled back because marked as rollback-only问题解决

C++ 中 VS 项目引入公共配置文件

Spring Integration 轻松实现服务间消息传递，真香！

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

狂揽17k star！Docker可视化神器，一键部署项目真香！

OpenWebUI：一站式 AI 应用构建平台体验