Architecture: data flow architecture

Introduction

Sometimes our system mainly processes and converts the input data. These processes and conversions are independent of each other. In this case, the input data is converted to the specified output.

In our daily work, we will often encounter this kind of data processing task, so for such a task we can use the data flow architecture.

Data flow architecture

There are many kinds of streams in actual work, the most common ones are I/O streams, I/O buffers, pipes, etc. Different components or modules are connected through these flows. The flow of data can be a topological graph with loops, a linear structure without loops, or a tree structure.

The main purpose of the data flow architecture is to achieve reuse and convenient modification. It is suitable for a series of well-defined independent data transformations or calculations on sequentially defined inputs and outputs, such as compilers and business data processing applications. Generally speaking, there are three basic data flow structures.

Sequential batch

Sequential batch processing is the most common and basic data flow architecture. The data as a whole will pass through one processing unit, and will enter the next processing unit after the processing of the previous processing unit is completed.

Let's look at the flow chart of sequential batch processing:

Data is transferred as a whole from one processor to another. Interaction is mainly through temporary files. The output of each processor is used as the input of the next processor, and after repeated data processing, the desired result is finally obtained.

The advantage of sequential batch processing is that each processing is independent, and they are combined to get an overall sequential processing architecture.

Of course the disadvantage is that it cannot be parallelized, it can only be executed serially, and throughput is not enough. The various processors interact only through intermediate files, and the degree of interaction is not high.

Pipes and filters

The functions of each processor in sequential batch processing are quite different, and generally speaking, they are different systems. If you are processing data flow tasks in the same system, you need to use pipes and filters.

Java 8 introduced the concept of streams and pipes. A collection can be converted into a stream, and the entire data stream can be transformed through the operation of the stream, and finally the desired result can be obtained.

This method emphasizes the incremental conversion of continuous components to data. In this method, the data flow is driven by data, and the entire system can be decomposed into components such as data sources, filters, pipes, and data receivers.

The connection between the modules is a data stream, which is a first-in/first-out buffer, which can be a byte stream, a character stream, or any other type of such stream. The main advantage of this architecture lies in its concurrent and incremental execution.

In this mode, the most important component is the filter, which is an independent data stream converter. It transforms the data of the input data stream, processes it, and writes the transformed data stream to the pipeline for processing by the next filter. It works in incremental mode, once the data arrives through the connected pipe, it starts to work.

The data in the above figure starts from the pipeline, passes through one by one filters, and finally gets the processed result.

There are two types of filters, namely active filters and passive filters. Active filters can actively pull data from the pipeline and push the processed data out. This mode is mainly used for UNIX pipes. The passive filter is responsible for receiving the data pushed by the pipeline.

The advantage of this model is that it can provide high concurrency and high throughput. The disadvantage is that it is not suitable for dynamic interaction.

Process control

There is another mode, neither batch processing nor pipeline mode. It controls different execution processes according to different input content. Similar to the judgment statement used in our program.

to sum up

We have introduced several data flow architecture methods above, and I hope you will like them.

Author of this article: those things about flydean program
Link to this article: http://www.flydean.com/07-data-flow-architecture/
Source of this article: flydean's blog
Welcome to pay attention to my official account: the most popular interpretation of "programs", the most profound dry goods, the most concise tutorials, and many tips you don't know are waiting for you to discover!

Architecture: data flow architecture

Introduction

Data flow architecture

Sequential batch

Pipes and filters

Process control

to sum up

flydean

引用和评论

在stable diffussion中完美修复AI图片

得物业务参数配置中心架构综述

分析型数据库入门指南：如何选择适合你的实时分析工具？

如何基于 Go 语言设计一个简洁优雅的分布式任务系统

软件架构模式实战指南：用真实血泪案例讲透技术选型

字节跳动开源 Godel-Rescheduler：适用于云原生系统的全局最优重调度框架

最近爆火的MCP究竟有多大魅力？MCP开发初体验｜得物技术