1. Demand
The system reads the csv
file from a fixed directory every day and prints it on the console.
2. Solutions
To solve the above requirements, there are many methods that can be used, here we choose to use Spring Batch
to achieve.
3. Matters needing attention
1. Obtaining the file path
Simple processing here, read the date in JobParameters
, then construct a file path, and put the file path into ExecutionContext
. For simplicity here, the file path will be hard-coded in the program, but at the same time, the file path will also be stored in ExecutionContext
, and in a specific Step
ExecutionContext
to get the path.
Notice:
ExecutionContext
中存入的数据虽然在各个Step
中都可以获取到, 不推荐
36166600f73c0262a5486e25f3dcf8bd---存入比较大
到ExecutionContext
, because the data of this object needs to be stored in the database.
2. If each Step obtains the value in the ExecutionContext
- Add
@StepScope
annotation to the class - Obtained by
@Value("#{jobExecutionContext['importPath']}")
eg:
@Bean
@StepScope
public FlatFileItemReader<Person> readCsvItemReader(@Value("#{jobExecutionContext['importPath']}") String importPath) {
// 读取数据
return new FlatFileItemReaderBuilder<Person>()
.name("read-csv-file")
.resource(new ClassPathResource(importPath))
.delimited().delimiter(",")
.names("username", "age", "sex")
.fieldSetMapper(new RecordFieldSetMapper<>(Person.class))
.build();
}
Explanation: When the program is instantiated FlatFileItemReader
, there is no jobExecutionContext at this time, then an error will be reported. If @StepScope
is added, then there is no problem. @StepScope
Indicates that this bean is not instantiated until it reaches the Step stage
3. Note on the use of FlatFileItemReader
When we use FlatFileItemReader
to read our csv file, we need to return the type FlatFileItemReader
instead of directly returning Reader must be open before it can be read
ItemReader
, otherwise the following error may occur- Reader must be open before it can be read
4. Implementation steps
1. Import dependencies and configure
1. Import dependencies
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
</dependencies>
2. Initialize the SpringBatch database
spring.datasource.username=root
spring.datasource.password=root@1993
spring.datasource.url=jdbc:mysql://127.0.0.1:3306/spring-batch?useUnicode=true&characterEncoding=utf8&autoReconnectForPools=true&useSSL=false
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
# 程序启动时,默认不执行job
spring.batch.job.enabled=false
spring.batch.jdbc.initialize-schema=always
# 初始化spring-batch数据库脚本
spring.batch.jdbc.schema=classpath:org/springframework/batch/core/schema-mysql.sql
2. Build file read path
My idea here is to complete the acquisition of the file path in JobExecutionListener
166dbc08f4ed75489d5685d6a88a973d---, and put it into ExecutionContext
, and then in each Step
can get it to the value of the file path.
/**
* 在此监听器中,获取到具体的需要读取的文件路径,并保存到 ExecutionContext
*
* @author huan.fu
* @date 2022/8/30 - 22:22
*/
@Slf4j
public class AssemblyReadCsvPathListener implements JobExecutionListener {
@Override
public void beforeJob(JobExecution jobExecution) {
ExecutionContext executionContext = jobExecution.getExecutionContext();
JobParameters jobParameters = jobExecution.getJobParameters();
String importDate = jobParameters.getString("importDate");
log.info("从 job parameter 中获取的 importDate 参数的值为:[{}]", importDate);
String readCsvPath = "data/person.csv";
log.info("根据日期组装需要读取的csv路径为:[{}],此处排除日期,直接写一个死的路径", readCsvPath);
executionContext.putString("importPath", readCsvPath);
}
@Override
public void afterJob(JobExecution jobExecution) {
}
}
3. Build Tasklet, output file path
@Slf4j
@Component
@StepScope
public class PrintImportFilePathTaskLet implements Tasklet {
@Value("#{jobExecutionContext['importPath']}")
private String importFilePath;
@Value("#{jobParameters['importDate']}")
private String importDate;
@Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
log.info("从job parameter 中获取到的 importDate:[{}],从 jobExecutionContext 中获取的 importPath:[{}]",
importDate, importFilePath);
return RepeatStatus.FINISHED;
}
}
It should be noted that the @StepScope
annotation is added to this class
4. Write entity classes
@AllArgsConstructor
@Getter
@ToString
public class Person {
/**
* 用户名
*/
private String username;
/**
* 年龄
*/
private Integer age;
/**
* 性别
*/
private String sex;
}
5. Write Job configuration
@Configuration
@AllArgsConstructor
@Slf4j
public class ImportPersonJobConfig {
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
private final PrintImportFilePathTaskLet printImportFilePathTaskLet;
private final ItemReader<Person> readCsvItemReader;
@Bean
public Job importPersonJob() {
// 获取一个job builder, jobName可以是不存在的
return jobBuilderFactory.get("import-person-job")
// 添加job execution 监听器
.listener(new AssemblyReadCsvPathListener())
// 打印 job parameters 和 ExecutionContext 中的值
.start(printParametersAndContextVariables())
// 读取csv的数据并处理
.next(handleCsvFileStep())
.build();
}
/**
* 读取数据
* 注意:此处需要返回 FlatFileItemReader类型,而不要返回ItemReader
* 否则可能报如下异常 Reader must be open before it can be read
*
* @param importPath 文件路径
* @return reader
*/
@Bean
@StepScope
public FlatFileItemReader<Person> readCsvItemReader(@Value("#{jobExecutionContext['importPath']}") String importPath) {
// 读取数据
return new FlatFileItemReaderBuilder<Person>()
.name("read-csv-file")
.resource(new ClassPathResource(importPath))
.delimited().delimiter(",")
.names("username", "age", "sex")
.fieldSetMapper(new RecordFieldSetMapper<>(Person.class))
.build();
}
@Bean
public Step handleCsvFileStep() {
// 每读取一条数据,交给这个处理
ItemProcessor<Person, Person> processor = item -> {
if (item.getAge() > 25) {
log.info("用户[{}]的年龄:[{}>25]不处理", item.getUsername(), item.getAge());
return null;
}
return item;
};
// 读取到了 chunk 大小的数据后,开始执行写入
ItemWriter<Person> itemWriter = items -> {
log.info("开始写入数据");
for (Person item : items) {
log.info("{}", item);
}
};
return stepBuilderFactory.get("handle-csv-file")
// 每读取2条数据,执行一次write,当每read一条数据后,都会执行process
.<Person, Person>chunk(2)
// 读取数据
.reader(readCsvItemReader)
// 读取一条数据就开始处理
.processor(processor)
// 当读取的数据的数量到达 chunk 时,调用该方法进行处理
.writer(itemWriter)
.build();
}
/**
* 打印 job parameters 和 ExecutionContext 中的值
* <p>
* TaskletStep是一个非常简单的接口,仅有一个方法——execute。
* TaskletStep会反复的调用这个方法直到获取一个RepeatStatus.FINISHED返回或者抛出一个异常。
* 所有的Tasklet调用都会包装在一个事物中。
*
* @return Step
*/
private Step printParametersAndContextVariables() {
return stepBuilderFactory.get("print-context-params")
.tasklet(printImportFilePathTaskLet)
// 当job重启时,如果达到了3此,则该step不在执行
.startLimit(3)
// 当job重启时,如果该step的是已经处理完成即COMPLETED状态时,下方给false表示该step不在重启,即不在执行
.allowStartIfComplete(false)
// 添加 step 监听
.listener(new CustomStepExecutionListener())
.build();
}
}
6. Write the Job startup class
@Component
@Slf4j
public class StartImportPersonJob {
@Autowired
private Job importPersonJob;
@Autowired
private JobLauncher jobLauncher;
@PostConstruct
public void startJob() throws JobInstanceAlreadyCompleteException, JobExecutionAlreadyRunningException, JobParametersInvalidException, JobRestartException {
JobParameters jobParameters = new JobParametersBuilder()
.addString("importDate", LocalDate.of(2022, 08, 31).format(DateTimeFormatter.ofPattern("yyyyMMdd")))
.toJobParameters();
JobExecution execution = jobLauncher.run(importPersonJob, jobParameters);
log.info("job invoked");
}
}
7. Automatically configure SpringBatch
@SpringBootApplication
@EnableBatchProcessing
public class SpringBatchReadCsvApplication {
public static void main(String[] args) {
SpringApplication.run(SpringBatchReadCsvApplication.class, args);
}
}
Mainly @EnableBatchProcessing
Notes
5. Execution results
6. Complete code
https://gitee.com/huan1993/spring-cloud-parent/tree/master/spring-batch/spring-batch-read-csv
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。