2
头图

Flink Chinese learning website
https://flink-learning.org.cn

Preface

With the continuous maturity of cloud data warehouse technology, data lake has become one of the most popular technologies at the moment, and Apache Hudi is one of the most competitive data lake formats at the moment:

  • Has one of the most active open source communities, weekly active PR has been maintained at a level of 50+;
  • It has one of the most active domestic user groups. The current Apache Hudi nail group has more than 2,200 users, and major domestic manufacturers have deployed the Apache Hudi ecosystem.

The activeness of Apache Hudi benefits from its excellent file format design and rich transaction semantic support:

  • The LSM-like file format layout is well adapted to the near real-time update scenario, and solves the pain points of the update of very large data sets;
  • Hudi's transaction layer semantics is extremely mature and rich in current lake storage. Basically all data governance can be automated: compaction, rollback, cleaning, and clustering.

Flink On Hudi

Apache Hudi's table format is friendly to stream computing, making Flink On Hudi one of the most worthy exploration and mining directions of the Apache Hudi project. Flink not only unlocks the real-time update capability of large data streams for Hudi, but also adds streaming consumption and computing. The ability to enable end-to-end near real-time ETL to be easily implemented on low-cost file storage.

The Flink On Hudi project was established in November 2020, and it has iterated three versions so far. Since the first version, the popularity and activity has been high. The Apache Hudi DingTalk group established in May has more than 2200+ users in the past six months, and its activity has always been in the forefront of the Flink user group.

Flink On Hudi has become the preferred solution for deploying the Apache Hudi project. Major domestic cloud vendors: Alibaba Cloud, Huawei Cloud, Tencent Cloud, and foreign AWS have integrated Flink On Hudi; domestic large Internet companies: Toutiao, Kuaishou, Station B, and Traditional companies: SF Express, Haikang, etc. all have Flink On Hudi production practices, and incomplete statistics such as tracking and return visits with Dingding Groups. At least more than 50+ domestic companies use Flink On Hudi in production. Uber has even adopted Flink On Hudi. As a key direction in 2022, we are advancing!

The developer ecology of Flink On Hudi is also very active. At present, students from Alibaba Cloud, Huawei Cloud, Toutiao, and Station B continue to contribute. Uber and AWS are more dedicated to connecting Flink On Hudi.

Version Highlights

The 0.10.0 version has been tempered by community users, contributed a number of important fixes, and has greatly enhanced core literacy capabilities, unlocked a number of new scenarios, and the focus of Flink On Hudi's update is summarized as follows:

bug fix

  • Fix the problem of data loss in extreme case stream reading on object storage [HUDI-2548];
  • Repair the occasional data duplication of full + incremental synchronization [HUDI-2686];
  • Fix that DELETE messages cannot be processed correctly in changelog mode [HUDI-2798];
  • Fix the memory leak problem of online compression [HUDI-2715].

new features

  • Support incremental reading;
  • Support batch update;
  • Add Append mode to write, and support small file merging at the same time;
  • Support metadata table.

function enhancement

  • Significant improvement in write performance: optimized write memory, optimized small file strategy (more balanced, no fragmented files), optimized write task and coordinator interaction;
  • Stream reading semantic enhancement: new parameter earliest added to improve the consumption performance from the earliest, support parameter skip compression reading, and solve the problem of reading duplication;
  • Online compression strategy enhancement: newly added eager failover + rollback, the compression order is changed to start from the earliest;
  • Optimize the semantics of event sequence: support processing sequence and automatic deduction of event sequence.

Here are some key content for you to introduce in detail.

Small file optimization

The Flink On Hudi writing process is roughly divided into the following components:

img

  • row data to hoodie : responsible for converting the data structure of the table into HoodieRecord;
  • bucket assigner : responsible for new file bucket (file group) assignment;
  • write task : responsible for writing data to file storage;
  • coordinator : responsible for the initiation and commit of writing trasaction;
  • cleaner : responsible for data cleaning.

The bucket assigner is responsible for the distribution of file groups and is also a core component of the small file distribution strategy. In the 0.10.0 version, each bucket assign task holds a bucket assigner, and each bucket assigner independently manages its own set of file group groups:

img

When writing INSERT data, the bucket assigner will scan the file view to see which of the currently managed file groups belong to the small file category. If the file group is determined to be a small file, it will continue to write additional files. For example, task-1 in the above figure will continue to add 80MB and 60MB of data to FG-1 and FG-2.

In order to avoid excessive write amplification, when the writable buffer is too small, it will be ignored. For example, although FG-3, FG-4, and FG-5 in the above figure are small files, they will not write additional files. task-2 will open a new file group for writing.

Global file view

Version 0.10.0 moves the file view of the original write task to JobManager. After JobManager is started, a web server will be started locally using Javaline to provide an access proxy for the global file view. The Write task interacts with the web server by sending an http request to get the currently written file group view.

Web server avoids repeated file system view loading, which greatly saves memory overhead.

img

Improved streaming ability

The 0.10.0 version adds a new parameter to consume data from the earliest. By specifying read.start-commit to earliest you can stream read the full + incremental data. It is worth mentioning that when earliest , the first file split is captured It will scan the file view directly. After the metadata table function is turned on, the scanning efficiency of the file will be greatly improved; the incremental reading part will scan the incremental metadata in order to quickly and lightly obtain the incremental file information .

img

Add processing order

Apache Hudi's message merging is roughly divided into two parts: incremental data internal merging, historical data and incremental data merging. Merge and pass between messages

write.precombine.field field is used to determine the new version and the old version. The message marked with a blue square in the following figure is the selected message after merging.

img

Version 0.10.0 does not need to specify the write.precombine.field field. At this time, the processing sequence is used: that is, the later messages are relatively new, corresponding to the selected message in the purple part of the above figure.

Metadata Table

Metadata table is a function introduced by Hudi 0.7.0. The purpose is to reduce DFS access on the query side, similar to the file listings and partitions information directly obtained through the metadata table query. Metadata has been greatly enhanced in version 0.10.0, and Flink also supports this feature.

The new version of the metadata table is a synchronous update model. After a successful data write is completed, the coordinator will first synchronously extract the file list, partiiton list and other information into the metadata table, and then write the event log to the timeline (metadata file).

The basic file format of the Metadata table is avro log. The file encoding in avro log is different from the normal MOR data log file. It is composed of efficient HFile data block. The advantage of this is that it supports more efficient kv search. At the same time, the avro log of the metadata table supports direct compression into HFile files to further optimize query efficiency.

img

Summary and outlook

In just six months, Flink On Hudi has accumulated a large number of user groups. Positive user feedback and rich user scenarios continue to polish the ease of use and maturity of Flink On Hudi, making the Flink On Hudi project iterate quickly in a very efficient manner. Through the form of co-construction with leading companies such as Toutiao and Station B, Flink On Hudi has formed a very benign developer user group.

Flink On Hudi is the main direction for the next two major versions of the Apache Hudi community. In the future planning, there are three main points:

  • completes the end-to-end streaming ETL scenario supports native change log, dimension table query, and lighter weight removal scenarios;
  • Streaming query optimization record-level index, secondary index, independent file index;
  • Batch query optimization z-ordering, data skipping.

thanks to

Finally, thanks to the active community users and developers of Flink Hudi. It is with you all the way that Flink On Hudi can evolve and iterate efficiently; also because of you, the exploration and practice of Flink On Hudi in the direction of real-time data lakes has gradually become an industry Pioneer, and more and more mature~

Students who are interested in Hudi can scan the QR code to join the nail group.

img


recent hot spots

Flink Forward Asia 2021 postponed, meet online

img


For more Flink related technical questions, you can scan the code to join the community DingTalk exchange group
Get the latest technical articles and community dynamics in the first time, please follow the public account~

image.png


ApacheFlink
946 声望1.1k 粉丝