大数据 - Deconstructing Streaming Storage — Pravega, Building an End-to-End Big Data Pipeline with Flink - 个人文章

The content of this article is compiled from the speech of Teng Yu, founder of Pravega China community and director of software engineering technology of Dell Technologies Group, at the main venue of Flink Forward Asia 2021. The main contents include:
The current state of storage abstraction
Pravega Performance Architecture Details
Pravega Streaming Storage Data Demo
Looking to the future

FFA 2021 Live Replay & Presentation PDF Download

1. The current state of storage abstraction

In computer software design, there is a very well-known point of view: any new computer problem can be solved by a newly added abstraction layer, and the same is true for storage. The figure above lists the three main storage abstractions that everyone uses, namely block storage, file storage, and object storage. Block storage was basically born at the same time as the modern computer industry; file storage emerged later as today's mainstream storage abstraction; object storage came later, in the 1990s. In the context of the development of real-time computing and the cloud era, streaming data has become more and more widely used. We believe that an emerging storage abstraction is needed to deal with this kind of data that emphasizes low latency and high concurrent traffic. Adding byte streams to file storage may be the most intuitive idea, but it faces a series of challenges in actual needs, and the implementation is more complicated, as shown in the following figure:

The impact of data write size on throughput is crucial. In order to balance latency and throughput, batching needs to be introduced in the implementation, and space needs to be pre-allocated to avoid performance loss caused by block allocation. Finally, as storage, persistence must also be ensure.

The local filesystem is inherently insecure. In the real world, hardware nodes, disk media can be corrupted at every moment. In order to solve these problems, it is necessary to introduce a distributed storage system, such as a common multi-copy system, but the copying of copies is time-consuming and space-consuming, and the data security problem of the distributed consensus protocol will become a new problem.

If based on the shared-nothing architecture design, there are no shared resources in complete parallel. For example, in the above figure, there are three copies stored on three independent physical nodes, then the storage size of a single stream data is limited by the size of a single physical boundary. storage cap, so this is still not a complete solution.

It can be seen that, due to the above-mentioned series of problems, it is quite complicated to build a stream storage based on a file system.

So, inspired by the point of the beginning, we try to solve this problem from a different angle and use the layered architectural pattern of distributed file and object storage. The underlying scalable storage can be used as a data lake, and the historical data of the stream can be stored in it to coexist with other non-streaming data sets, which can save the data migration, storage operation and maintenance costs and solve the problem of data islands. calculation will be more convenient.

A layered architecture can address decentralized multi-clients. In order to solve the problem of fault-tolerant recovery, we added distributed log storage based on stream storage.

In real application instances, the traffic of real-time applications may fluctuate over time. To cope with data peaks and valleys, we introduce dynamic scaling.

In the icon in the upper left corner, the horizontal axis represents time, and the vertical axis represents the data of the application. Its waveform graph is very stable at first, but when it reaches a certain point, the amount of data suddenly increases. This is often the characteristics of some time-related applications, such as morning and evening peaks, Double Eleven and other scenarios. At this time, traffic will flood in. .

A complete stream storage system should be able to sense changes in real-time traffic and allocate resources reasonably. When the amount of data becomes larger, the bottom layer will dynamically allocate more storage units to process the data. In the underlying architecture of the cloud-native era, dynamic scaling is also a very important point in general-purpose storage systems.

2. Detailed explanation of Pravega performance architecture

Pravega is designed to be a real-time storage solution for streaming.

First, the application stores data persistently in Pravega. With the help of the layered storage architecture, Pravega Stream realizes unlimited stream storage;
Second, Pravega supports end-to-end only-once semantics, including read-side checkpoint and transactional write functions, so that computing can be split into multiple reliable independent applications, realizing the micro-service architecture of streaming systems;
Third, Pravega's dynamic scaling mechanism can monitor traffic and automatically allocate resources at the bottom layer, so that developers and operation and maintenance personnel do not need to manually schedule clusters.

Pravega's independent scaling mechanism allows producers and consumers to not affect each other, and the data processing pipeline becomes elastic and can respond timely to real-time changes in data.

As shown, Pravega views streaming data from a storage perspective. Kafka itself is positioned as a message system, so it looks at streaming data from a message perspective. The positioning of the message system and the storage system is different. The message system refers to the message transmission system, which mainly focuses on the process of data transmission and production and consumption. Pravega is positioned as an enterprise-level distributed stream storage product, which not only satisfies the properties of streams, but also supports the persistence, security, reliability, isolation and other properties of data storage.

Different perspectives bring differences in design architecture. Since the message system does not need to deal with long-term data storage, additional tasks and components are required for data handling to complete persistence. Pravega, which locates stream storage, directly solves the data retention problem in the main storage. The core engine of Pravega's data storage is called segment store, which directly connects to the distributed storage components of HDFS or S3 for long-term and persistent storage. Relying on the mature architecture and scalability of the underlying storage components, Pravega naturally has the characteristics of large-scale storage and scalability on the storage side.

Another architectural difference comes from lowercase optimizations on the client side. In a typical IoT scenario, the number of writers at the edge is usually large, and the amount of data written each time may not be much. Pravega's design also fully considers this scenario, using the optimization of batching twice on the client and the segment store, combining the lowercase from multiple clients into a small batch write that is friendly to the underlying disk, thus ensuring low latency. On the basis of , greatly improve the performance of high concurrent write.

This design has a corresponding impact on performance. Pravega tested the performance comparison with Kafka and Pulsar under the same high concurrent load. The experimental results are shown in the following figure. Using a highly parallel workload in our tests, each Stream/Topic has up to 100 writers and 5000 segments. Pravega can sustain a rate of 250MBps under all conditions, which is also the maximum write rate of the disk in the test cloud environment. As can be seen in the left and right graphs, when Kafka and Pulsar increase the number of partitions and producers, the performance will be significantly reduced, and the performance will be degraded successively under large-scale concurrency.

The experimental process and environment are also fully disclosed in this blog . The experimental code is also completely open source and contributed to openmessaging-benchmark, and interested students can try to reproduce.

3. Pravega Stream Storage Data Demonstration

From a storage perspective, we have introduced the changes and architectural features of Pravega's convective storage. Next, let's talk about how to consume streams to store data. How Pravega cooperates with Flink to build an end-to-end big data pipeline.

The big data architecture we propose uses Apache Flink as the computing engine to unify batch processing and stream processing through a unified model and API; uses Pravega as the storage engine to provide a unified abstraction for streaming data storage, making it possible to understand historical and real-time data. There is consistent access. The two are unified to form a closed loop from storage to computing, which can deal with both high-throughput historical data and low-latency real-time data.

Further, for edge computing scenarios, Pravega is also compatible with commonly used message communication protocols, and implements corresponding data receivers, which can be used as pipelines for big data pipeline processing, collect data from pipelines, and send data to downstream computing engine applications. Complete the data processing pipeline from the edge to the data center. In this way, enterprise users can easily build their own big data processing pipelines. That's what we've developed streaming storage products for.

We believe that in the cloud-native era, the function of dynamic scaling is very important, which can not only reduce the difficulty of application development, but also make more efficient use of the underlying hardware resources. The dynamic scaling of Pravega was introduced before, and the new version of Flink also supports the dynamic scaling function. So what if we link two independent scaling and push the dynamic scaling function to the entire pipeline?

We cooperated with the community to complete the complete scaling link and bring the end-to-end Auto-scaling function to all customers. The figure above is a schematic diagram of the end-to-end Auto-scaling concept. When the data injection becomes larger, Pravega can automatically scale and allocate more segments to handle the pressure on the storage side. Through metrics and Kubernetes HPA functions, Flink can also allocate more parallel computing nodes to corresponding computing tasks accordingly, so as to cope with changes in data traffic. This is a new generation of cloud-native capabilities that are critical to enterprise users.

4. Looking to the future

At the Flink Forward Asia 2021 conference, the Pravega community, together with Flink, jointly released a white paper on the database synchronization solution. Pravega is also evolving as a CNCF project, while embracing open source more firmly.

With the continuous development of technology, more and more streaming engines and database engines have begun to develop towards integration. Looking ahead, the lines between storage and computation, streams and tables are blurring. Pravega's batch-stream integrated storage design also implies an important development direction for Flink in the future. Pravega will also actively communicate and cooperate with open source communities related to data lake warehouses, including Flink, to build a more friendly and powerful new generation of data pipelines for enterprise users.

FFA 2021 Live Replay & Presentation PDF Download

For more technical issues related to Flink, you can scan the code to join the community DingTalk exchange group
Get the latest technical articles and community dynamics for the first time, please pay attention to the public number~

Deconstructing Streaming Storage — Pravega, Building an End-to-End Big Data Pipeline with Flink

1. The current state of storage abstraction

2. Detailed explanation of Pravega performance architecture

3. Pravega Stream Storage Data Demonstration

4. Looking to the future

ApacheFlink

引用和评论

Flink CDC 3.4 发布, 优化高频 DDL 处理，支持 Batch 模式，新增 Iceberg 支持

Dolphinscheduler IDEA本地调试

蚂蚁技术研究院发布推理大模型强化学习框架，邀请开发者共同助力 AGI 生态

【Hadoop】HDFS架构解析

【Hadoop】HBase系统解析及适用场景

基于 MCP 的 AI Agent 应用开发实践

Koupleless 助力「人力家」实现分布式研发集中式部署，又快又省！