ByteDance Flink Status Query Practice and Optimization

Abstract: This article is compiled from ByteDance's infrastructure engineer, Apache Flink Contributor Ma Yue's speech at the Flink Forward Asia 2021 platform construction session. The main contents include:
background
State Processor API Introduction
StateMeta Snapshot mechanism
State as Database
Query task status using Flink Batch SQL
future outlook

Click to view live replay & speech PDF

1. Background

As we all know, the State in Flink saves the intermediate results of the operator calculation process. When an exception occurs in a task, you can obtain valid clues by querying the State in the task snapshot.

But at present, for Flink SQL tasks, when we want to query the job state, the cost of querying the state is usually too high because we cannot know the definition method and specific type of the state.

In order to solve this problem, the ByteDance streaming computing team internally proposed the solution of State Query on Flink SQL - users can simply query State by writing SQL. This article will mainly introduce the related work of ByteDance in Flink status query.

2. Introduction to State Processor API

When it comes to state query, we naturally think of the feature proposed by Flink in version 1.9--State Processor API. Using the State Processor API, we can convert the Savepoint generated by the job into a DataSet, and then use the DataSet API to complete operations such as querying, modifying, and initializing the State.

The following briefly introduces how to use the State Processor API to complete the State query :

First create an ExistingSavepoint to represent a Savepoint. When initializing ExistingSavepoint, you need to provide information such as Savepoint path and StateBackend;

Then implement the ReaderFunction to re-register the State that needs to be queried and define the way to handle the State. In the process of querying the state, it will traverse all the keys and operate the state according to the way we defined;

Finally, call Savepoint.readKeyedState and pass in the operator's uid and ReaderFunction to complete the State query.

Next, I will briefly describe the principle behind the State query .

There are two kinds of files in the Savepoint directory, one is the state data file, such as opA-1-state in the above figure, this file saves the detailed data of the operator A in the first SubTask state; there is also a kind of metadata file, corresponding to _metadata in the above figure, the mapping relationship between each operator and the state file is saved in the metadata file.

When we are doing a status query. First, on the client side, the metadata file will be parsed according to the Savepoint path. Through the operator ID, you can obtain the handle of the file corresponding to the status to be queried. When the state query is actually executed, the Task responsible for reading the state will create a new StateBackend, and then restore the data in the state file to the Statebackend. After the state restoration is completed, all keys will be traversed and the corresponding state will be handed over to the ReaderFunction for processing.

Some students may ask, since the community has provided the function of querying State, why do we need to do the same work? Mainly because we found some problems in the process of using the State Processor API :

Every time we query State, we need to develop a Flink Batch task independently, which has a certain development cost for users;

When implementing ReaderFunction, it is necessary to clearly understand how the task state is defined, including the name, type, and State Descriptor of State, which is a high threshold for users to use;

When using the State Processor API, you can only query the status of a single operator, and cannot query the status of multiple operators at the same time;

It is not possible to directly query the meta information of task status, such as querying which statuses the task uses, or querying the type of a certain status.

In general, we have two goals, one is to reduce the cost of use for users ; the other is to enhance the function of status query . We want the easiest way for users to query State; at the same time, they don't need to know anything.

In addition, we also hope that users can query the states of multiple operators at the same time, and can also directly query which states are used by the job, and what is the type of each state.

Therefore, we propose a solution of State Query on Flink SQL . Simply put, it treats State as a database, allowing users to easily query State by writing SQL.

In this scheme, we need to solve two problems :

How to block State information for users : Referring to the State Processor API, we can know that querying a State needs to provide a lot of information, such as Savepoint path, StateBacked type, operator id, State Descriptor and so on. It is obviously difficult to fully express these complex information through SQL statements, so what content is required to query the state, and how can we shield the complex details of the state from users? This is the first difficulty we face.

How to express State in SQL : The storage method of State in Flink is not the same as that of Database. How can we express the query process of state in SQL? This is another difficulty we have to solve.

3. StateMeta Snapshot mechanism

First, let's answer the first question, what information is needed to query a State ?

You can refer to the example of State Processor API above. When we create ExistingSavepoint and ReaderFunction, the information we need to provide includes Savepoint path, Backend type, OperatorID, operator key type, State name, Serializer, etc. We can use these Meta-information collectively called state.

For Flink SQL tasks, the threshold for users to clearly understand this information is very high. Our idea is that users only need to provide the simplest information, that is, the Savepoint ID, and then the Flink framework stores other meta-information in the Savepoint, so that the complex details of the State can be shielded from the user and the state query can be completed. Therefore, we introduced the StateMeta Snapshot mechanism.

StateMeta Snapshot is simply the process of adding state meta information to Savepoint Metadata . The specific steps are as follows:

First, when the State is registered, the Task will save the meta-information such as operatorName\ID\KeySerializer\StateDescriptors in the Task's memory;

When a Savepoint is triggered, the Task will snapshot the state's meta information while making a snapshot. After the snapshot is completed, the state meta information (StateMeta) and the state file handle (StateHandle) are reported to the JobManager together;

After the JobManager receives the StateMeta information reported by all tasks, it merges the state meta information, and finally saves the merged state meta information to a file named stateInfo in the Savepoint directory.

After that, when querying the state, it is only necessary to parse the stateInfo file in the Savepoint, and the user no longer needs to input the meta information of these states through code. In this way, the cost of user query status can be greatly reduced.

4. State as Database

Next, let's answer the second question, how do we express State in SQL. In fact, the community proposed some solutions when designing the State Processor API, that is, State As Database.

In traditional databases, three elements of Catalog, Database, and Table are usually used to represent a Table. In fact, we can also map the same logic to Flink State. We can regard the State of Flink as a special data source, and the Savepoint generated by the job each time is regarded as an independent DB. In this DB, we abstract State meta information and State detailed data into different Tables and expose them to users. Users can obtain task status information by directly querying these Tables.

First, let's see how to represent a State as a Table. We all know that in Flink, there are two types of commonly used State, KeyedState and OperatorState .

For OperatorState, it has only one property called Value, which is used to represent the specific value of this State. So we can represent OperatorState as a table structure with only one Value field.

For KeyedState, each State may have different values under different Keys and Namespaces, so we can represent KeyedState as a table structure containing three fields of Key, Namespace, and Value.

When we abstract a single State, it is easier to represent multiple States. As you can see in the example above, this operator contains three states, two KeyedStates and one OperatorState. We only need to simply union these Tables, and then use the state_name field to distinguish different states. Represents all the states in this operator.

Finally, there is a question, how do we know which states or the specific types of these states are used by a task ?

To solve this problem, we define a special table - StateMeta , which is used to represent the meta information of all states in a Flink task. StateMeta contains the name of each state in a task, the operator ID where the state is located, the operator name, the type of Key and the type of Value, etc., so that users can directly query the StateMeta table to obtain the meta information of all states in the task .

5. Use Flink Batch SQL to query task status

The above is the overall introduction of the status query scheme. So how do we query a State ? Let's take a Word Count task as an example to illustrate .

First, we need to create a Flink SQL task and start it. Through web-ui, you can see that this task contains three operators, namely Source, Aggregate and Sink. Then, we can trigger the Savepoint, and get the corresponding SavepointID when the Savepoint is made successfully. We can complete the job status query through SavepointID.

If we know nothing about the use of states in Flink SQL tasks, the first thing we need to query is which states are included in this Flink task and the types of these states. We can get this information from the StateMeta table. As shown in scenario 1 in the above figure, by querying the StateMeta table, you can see that this task contains a ListState and a ValueState, which exist in the Source operator and the Aggregate operator respectively.

In addition, some students who are familiar with Flink know that the State in KafkaSource is used to record the Offset information of the current consumption. As shown in Scenario 2, we can obtain the Partition and Offset information of the Kafka Topic consumed in the task by querying the state of the Source operator.

There is also a relatively common scenario, for example, downstream business students find that the result of a certain key (such as key_662) is abnormal. When locating the problem, we can directly query the status of the aggregate operator in the job, and specify the key equal to key_662 as the query condition. As shown in Scenario 3 above, you can see from the query results that when the key is 662, the corresponding aggregate result is 11290. Using this method, the user can easily verify whether the status is correct.

6. Future Outlook

In the future, we plan to further enrich the functions of State . At present, we support the function of querying State using SQL. In fact, the community also provides the ability to modify and initialize State. In some scenarios, these capabilities are also more important. For example, we know that some keys in the state are incorrectly calculated, and we hope to correct this part of the data in the state; or the task logic is not fully compatible with the previous state after the change, at this time, we hope to be able to modify and initialize the state through the ability to generate a new Savepoint. Similarly, in terms of usage, we also hope that users can directly use the insert and update syntax in SQL to complete state modification and initialization operations.

Second, we will further enhance the availability of State . Our solution of using DAG editing solves the state incompatibility problem when the job topology changes, but when the Flink SQL task modifies fields, the State Serializer may change, which also causes the state to be incompatible. In response to this situation, we have designed a complete Flink SQL State Schema Evolution scheme, which can greatly enhance the state recovery capability of Flink SQL tasks after changes. The scheme is currently being implemented. In addition, we also provide a complete pre-check capability for status recovery, which can check whether the status is compatible and inform the user before the task goes online, so as to avoid the impact of job startup failure caused by the status incompatibility on the online.

Click to view live replay & speech PDF

2022 4th Real-time Computing FLINK Challenge

490,000 bonuses are waiting for you!

Continue the "Encouraging Teacher Program" and win generous gifts!

Click to enter the official website of the competition to learn about registration

For more technical issues related to Flink, you can scan the code to join the community DingTalk exchange group to get the latest technical articles and community dynamics as soon as possible. Please pay attention to the public number~

Recommended activities

Alibaba Cloud's enterprise-level product based on Apache Flink - real-time computing Flink version is now open:
99 yuan to try out the Flink version of real-time computing (yearly and monthly, 10CU), and you will have the opportunity to get Flink's exclusive custom sweater; another package of 3 months and above will have a 15% discount!
Learn more about the event: https://www.aliyun.com/product/bigdata/en

ByteDance Flink Status Query Practice and Optimization

1. Background

2. Introduction to State Processor API

3. StateMeta Snapshot mechanism

4. State as Database

5. Use Flink Batch SQL to query task status

6. Future Outlook

ApacheFlink

引用和评论

Flink CDC 3.4 发布, 优化高频 DDL 处理，支持 Batch 模式，新增 Iceberg 支持

【Hadoop】HDFS架构解析

【Hadoop】HBase系统解析及适用场景

基于 pyflink 的算法工作流设计和改造

通过Milvus内置Sparse-BM25算法进行全文检索并将混合检索应用于RAG系统

MCP+Hologres+LLM 搭建数据分析 Agent

基于 Flink CDC YAML 的 MySQL 到 Kafka 流式数据集成