Introduction to This article introduces the practice and application of Flink, including: Flink's containerization transformation and practice, Flink SQL practice and application, and future planning.
Author: Shen Lei
Introduction: The main content shared today is the practice and application of Flink in praise. content include:
- Flink's containerization transformation and practice
- Practice and application of Flink SQL
- future plan.
GitHub address
https://github.com/apache/flink
Everyone is welcome to give Flink likes and send stars~
1. Flink's containerization transformation and practice
1. A good history of cluster evolution
- In July 2014, the first Storm task was officially launched;
- In 2016, Spark Streaming was introduced, running on Hadoop Yarn;
- In 2018, Flink was introduced, and the operation mode was Flink on Yarn Per Job;
- In June 2020, 100% Flink Jar task K8sization was realized. K8s is the default computing resource of Flink Jar, Flink SQL task On Yarn, Flink unified real-time development;
- In November 2020, the Storm cluster officially went offline. All the original storm tasks have been migrated to Flink;
- In 2021, we intend to K8s all Flink tasks.
2. Business scenarios supported by Flink internally
The business scenarios supported by Flink include risk control, real-time tasks of burying points, payment, real-time feature processing of algorithms, real-time billboards of BI, and real-time monitoring, etc. The current real-time task scale is 500+.
3. There are like pain points in Flink on Yarn
There are three main parts:
- First, the CPU is not isolated. In Flink On Yarn mode, the CPU is not isolated. When a real-time task causes a machine's CPU to be used too high, it will affect other real-time tasks of the machine;
- Second, the cost of large-scale expansion and contraction is high. Yarn and HDFS services use physical machines, which are inflexible in expansion and contraction during the promotion period, and require a certain amount of manpower and material resources;
- Third, it needs to invest in human operation and maintenance. The company's underlying application resources are unified to K8S, and the operation and maintenance of the Yarn cluster separately will increase the cost of manpower operation and maintenance for a type of cluster.
4. Advantages of Flink on k8s over Yarn
It can be summarized into 4 points:
- First, unified operation and maintenance. The company has unified operation and maintenance, and has a dedicated department for operation and maintenance K8S;
- Second, CPU isolation. CPU isolation between K8S Pods, real-time tasks do not affect each other, and are more stable;
- Third, storage and calculation are separated. Flink's computing resources and state storage are separated, and computing resources can be mixed with other component resources to improve machine utilization;
- Fourth, flexible expansion and contraction. During the big promotion period, the capacity can be expanded and contracted flexibly, which can save manpower and material costs.
5. Real-time cluster deployment
Generally divided into three layers. The first layer is the storage layer; the second layer is the real-time computing resource layer; the third layer is the real-time computing engine layer.
The storage layer is mainly divided into two parts:
- The first is the cloud disk, which mainly stores the local state of Flink tasks and the logs of Flink tasks;
- The second part is the real-time calculation of the HDFS cluster, which mainly stores the remote state of Flink tasks.
The second layer is the resource layer of real-time computing, which is divided into two parts:
- One is Hadoop Yarn cluster;
- The other is the Flink k8s cluster, and further subdivided, there will be resources of Flink k8s and offline HDFS hybrid clusters, as well as Flink k8s separate types of cluster resources.
- The top layer has some real-time Flink Jar, spark streaming tasks, and Flink SQL tasks.
The reason we consider mixing is that offline HDFS clusters do not have high machine usage during the day. The offline HDFS cluster computing resources are given to real-time tasks, and the elastic computing resources of other internal components are used offline, thereby increasing the utilization rate of the machine and achieving a better cost reduction effect.
6. Flink on k8s containerization process
As shown below:
- The first step is to submit the Flink Jar task of the real-time platform, Flink Jar task version management, Docker Flink task image construction, and upload the image to the Docker image warehouse;
- The second step is to start the task;
- The third step, yaml file creation;
- The fourth step is to exchange commands with k8s Api Server;
- The fifth step is to pull the Flink task image from the Docker mirror warehouse to the Flink k8s cluster;
Finally, the task runs. Here are a few tips:
- The operation mode is Flink Standalone Per Job mode;
- Each Flink Jar task has a mirror image, and the version of the mirror image is taken as the task name + time cut;
- JobManager needs to be created as Deployment instead of Job type;
- Dockerfile specifies HADOOP\_USER\_NAME, which is consistent with online tasks.
7. Some practices in Flink on k8s
The first practice is to solve the problem that tasks with less resources cannot be started.
First, let’s describe the problem. Flink on k8s is not cloud native and cannot apply for real-time task resources on demand. When the resources configured by the user on the platform are less than the resources actually used by the real-time task (for example, the user code is hard to write the concurrency, but the concurrency configured by the user is less than this value), the real-time task cannot be started.
In response to this problem, we internally added an automatic detection mechanism for the concurrency of Flink Jar tasks. Its main process is shown in the figure below. First of all, the user will submit the Flink Jar job on our platform. After the submission is completed, the Jar job and running parameters will be built in the background to build a PackagedProgram. Get the pre-execution plan of the task through PackagedProgram. Then get the true concurrency of the task through it. If the concurrency configured by the user in the code is less than the resources configured on the platform side, we will use the configuration on the platform side to apply for resources and then start; otherwise, we will use its real task concurrency to apply for resources and start tasks.
The second practice is the resource analysis tool for Flink on k8s tasks.
First of all, for the background, Flink k8s task resources are configured by users. When the configured concurrency or memory is too large, there will be a waste of computing resources, which will increase the cost of the underlying machine. How to solve this problem, we made a platform administrator tool. For the administrator, he can see whether the resources for this task have been over-allocated from two perspectives:
- The first is the perspective of task memory. According to the task's GC log, we use an open source tool GC Viewer to get the memory usage indicator of this real-time task;
- The second is the perspective of message processing capabilities. We added data source input record/s and task message processing time metric to the Flink source code layer. According to the metric, find the task or operator with the slowest message processing to determine whether the concurrency configuration is reasonable.
The administrator presets Flink resources based on memory analysis indicators and the reasonableness of concurrency, combined with optimization rules. Then we will communicate and adjust with the business side. The picture on the right shows the two analysis results. The above shows the memory analysis result of Flink on K8S pod. The following is the analysis result of the task processing capability of Flink K8S. In the end, we can readjust the resources of the task based on these indicators and reduce the waste of resources. We are currently planning to make it an automated analysis and adjustment tool.
Next are other related practices of Flink on K8s.
- First, it is based on the use of Ingress Flink Web UI and Rest API. Each task has an Ingress domain name, which is always used to access Flink Web UI and Resti API through the domain name;
- Second, mount multiple hostpath volumes to solve the IO limitation of a single cloud disk. There are bottlenecks in the write bandwidth and IO capacity of a single cloud disk. Use multiple cloud disks to reduce the checkpoint status of the cloud disk and the pressure of local writing;
- Third, Flink related general configuration ConfigMap, Flink mirror upload successful detection. Create a configmap for the general configuration of Filebeat and Flink jobs, and then mount it to the real-time task to ensure that each Flink task image is successfully uploaded to the mirror warehouse;
- Fourth, HDFS disk SSD and filebeat-based log collection. SSD disk is mainly to reduce the IO Wait time of the disk, adjust dfs.block.invalidate.limit, and reduce the number of HDFS Pending delete block. Task logs are collected using Filebeat, output to kafka, and later viewed through custom LogServer and offline public LogServer.
8. Current pain points facing Flink on K8s
- First, the JobManager HA problem. If the JobManager Pod hangs, with the help of k8s Deployment capability, JobManager will restart according to the yaml file, and the state may be lost. And if yaml configures Savepoint recovery, the message may be repeated a lot. We hope to support Jobmanager HA with the help of ZK or etcd in the future;
- Second, modify the code and upload again for a long time. Once the code modifies the logic, the upload time of the Flink Jar task plus the mirroring time may be at the level of minutes, which may have an impact on businesses with high real-time requirements. We hope that we can refer to the community's implementation in the future and pull the task Jar from HDFS to run;
- Third, K8S Node Down machine, JobManager recovery is slow. Once the K8S Node is down, it takes about 8 minutes for Jobmanager Pod to resume operation. The main reason is the discovery time of internal abnormalities in k8s and the start time of jobs, which have an impact on some businesses, such as CPS real-time tasks. How to solve it, the platform side regularly detects the K8s node status, once the down state is detected, the task that has the JobManager on the node is stopped, and then resumes from its previous checkpoint;
- Fourth, Flink on k8s is not cloud native. Currently, the Flink Jar task concurrency automatic detection tool is used to solve the problem of insufficient resource allocation and failure to start, but if the pre-execution plan of the task cannot be obtained, the concurrency of the code configuration cannot be obtained. Our thinking is: Flink on k8s cloud native functions and the previous 1 and 2 problems, if the community supports it quickly, we may consider aligning the Flink version with the community version later.
9. Some recommendations for Flink on K8s
The first solution is to build and manage the mirror image of the task by the platform itself.
- The advantage is that the platform itself controls the overall process of building mirrors and running real-time tasks, and specific problems can be corrected in time.
- The disadvantages are: you need to have a certain understanding of Docker and K8S related technologies, the threshold is relatively high, and you need to consider non-cloud native related issues. Its applicable version is Flink 1.6 or above.
The second solution is Flink k8s Operator.
- The advantage is: a lot of low-level details are encapsulated for users as a whole, and the threshold for use is relatively lower.
- The disadvantage is: the overall use is not as flexible as the first solution. Once there is a problem, the bottom layer is not easy to modify because the bottom layer uses its encapsulated function. Its applicable version is Flink 1.7 and above.
The last solution is based on the community Flink K8s function.
- The advantage is: cloud native, more friendly to the application of resources. At the same time, users will be more convenient to use, shielding many underlying implementations.
- The disadvantage is that the K8s cloud native function is still an experimental function, and related functions are still under development, such as k8s Per job mode. Its applicable version is Flink 1.10 and above.
Two, Flink SQL practice and application
1. You like the development history of Flink SQL
- In September 2019, we researched and tried Flink 1.9 and 1.10 SQL capabilities, and at the same time enhanced some Flink SQL functions.
- In October 2019, we conducted SQL function verification. Based on the real-time requirements of embedded points, we verified the Flink SQL Hbase dimension table correlation function, and the results were in line with expectations.
- In February 2020, we expanded the functions of SQL, using Flink 1.10 as the SQL calculation engine to develop and optimize Flink SQL functions. The real-time platform supports full SQL development.
- In April 2020, it began to support real-time needs related to real-time data warehouse, Youzan education, beauty industry, and retail.
- In August 2020, the new version of the real-time platform was officially launched. Currently, Flink SQL is the main push to develop our real-time tasks.
2. Some practices in Flink SQL
It is mainly divided into three aspects:
- First, the practice of Flink Connector includes: Flink SQL supports Flink NSQ Connector, Flink SQL supports Flink HA Hbase Sink and dimension tables, Flink SQL supports unsecured Mysql Connector, Flink SQL supports standard output (supported by the community), Flink SQL supports Clickhouse Sink;
- Second, the practice of the platform layer includes: Flink SQL supports UDF and UDF management, supports task recovery from Checkpoint, supports idempotent functions, supports Json related functions, etc., supports Flink operation related parameter configuration, such as state time settings, aggregation optimization parameters, etc. Etc., Flink real-time task blood relationship data automatic collection, Flink syntax correctness detection function;
- Third, the practice of Flink Runtime includes: Flink source code adds single Task and Operator single record processing time indicators; fixes the bug that Flink SQL can withdraw TOP N.
3. Business Practice
The first practice is the real-time kanban of our internal customer service robot. The process is divided into three layers:
- The first layer is the real-time data source, the first is the online MySQL business table, we will synchronize its Binlog to the corresponding Kafka Topic through the DTS service;
- There are three Kafka topics in the ODS layer of real-time tasks;
In the real-time DWD layer, there are two Flink SQL tasks.
- Flink SQL A consumes two topics, and then uses Interval Join to associate the data in these two topics with the corresponding data according to the function of some windows. At the same time, the state retention time will be set for this real-time task. After Join, it will perform some ETL processing, and finally input its data into a topic C.
- Another real-time task Flink SQL B consumes a topic, then cleans the data in the topic, and then associates a dimension table in HBase to associate some additional data it needs, and the associated data will eventually be entered Go to topic D.
In the upstream, Druid will consume the data of these two topics to query some indicators, and finally provide it to the business side.
The second practice is the middle layer of real-time user behavior. Users will search, browse, add to shopping carts, etc. on our platform, and corresponding events will be generated. The original plan was based on offline. We will store the data in the Hive table, and then the students on the algorithm side will combine user characteristics, machine learning models, and offline data to generate some user score estimates, and then enter it into HBase.
Under this background, there will be the following demands: current user ratings are mainly based on offline tasks, and algorithm students hope to combine real-time user characteristics to improve the accuracy of recommendation in a more timely and accurate manner. This actually requires building a real-time user behavior middle layer, inputting user-generated events into Kafka, processing the data through Flink SQL jobs, and outputting the corresponding results to HBase. The algorithm students then combine the algorithm model to update some parameters in the model in real time, and finally estimate the user's score in real time. It will also be stored in HBase and then used online.
The construction process of the user behavior middle layer is divided into three steps:
- On the first layer, our data source is in Kafka;
- The second layer is the ODS layer. In the Flink SQL job, there will be some flow table definitions and some ETL logic processing. Then go to define related sink tables, dimension tables, and so on. There will also be some aggregation operations, and then input to Kafka;
- At the DWS layer, there are also user Flink SQL jobs, which will involve the use of the user's own UDF Jar, multi-stream Join, and UDF. Then read some data in the ODS layer, store it in HBase, and finally use it for the algorithm team.
Here are a few practical experiences:
- First, Kafka Topic, Flink task name, Flink SQL Table name, follow the data warehouse naming convention.
- Second, for index aggregation calculations, Flink SQL tasks should set the idle state retention time to prevent the task status from increasing indefinitely.
- Third, if there is data skew or high read state pressure, you need to configure Flink SQL optimization parameters.
4. Practice in HAHBase Connector
The community HBase Connector data association or writing is used by a single HBase cluster. When the HBase cluster is unavailable, the writing or association of real-time task data will be affected, which may affect business use. As for how to solve this problem. First of all, there are two clusters in HBase, the main cluster and the standby cluster. The master-slave replication is carried out between them through WAL. Flink SQL jobs are written to the primary cluster first. When the primary cluster is unavailable, it will be automatically downgraded to the standby cluster without affecting the use of online services.
5. Mysql Connector and indicator extension practice without secret
The picture on the left is Flink's unsecured Mysql Sink grammar. The problems to be solved include three points:
- First, the username and password of the Mysql database are not exposed and stored in plain text;
- Second, it supports periodic update of Mysql username and password;
- Third, the internal authority is automatically used according to the user name authentication table. The main purpose of this is to ensure safer use of the real-time task database.
Then, in the lower left picture, we add Task and Operator single message processing time metric at the Flink source code level. The purpose is to help the business party to troubleshoot and optimize Flink real-time tasks based on the monitoring indicators of message processing time.
6. Practice of automatic collection of Flink task blood relationship metadata
The flow of Flink task blood relationship metadata collection is shown in the figure below. After the platform starts the real-time task, it takes two different paths according to whether the current task is a Flink Jar task or a Flink SQL task to obtain the blood relationship data of the task. Data reporting metadata system. The value of this is twofold:
- First, help the business side understand the real-time task processing link. The business side can more clearly understand the relationship and impact between real-time tasks, and when operating tasks, it can promptly notify other downstream business parties;
- Second, better construct real-time data warehouses. Combining the real-time task blood map, refine the real-time data common layer, improve the reusability, and better build the real-time data warehouse.
3. Future planning
Finally, the future plan includes four points:
- First, promote the SQLization of Flink's real-time tasks. Promote Flink SQL to develop real-time tasks and increase the proportion of Flink SQL tasks.
- Second, Flink task computing resources are automatically optimized and configured. Analyze task resources from memory, task processing capacity, input rate, etc., and automatically configure tasks with unreasonable resource allocation, thereby reducing machine costs.
- Third, the K8sization of Flink SQL tasks and K8s cloud native. Flink's underlying computing resources are unified to k8s, which reduces operation and maintenance costs. Flink k8s is cloud-native and uses K8s resources more reasonably.
- Fourth, the research on Flink, data lake and CDC function technology. The research and reserve of new technologies will lay the technical foundation for other real-time needs in the future.
Keywords: Flink SQL, Flink on Yarn, Flink on K8s, real-time computing, containerization
Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。