1
头图

Abstract: This article organizes the sharing of Xia Chang, the head of real-time computing in Douyu, at the Flink Forward Asia 2021 industry practice session. This article is mainly divided into four parts:

  1. Background introduction
  2. Real-time platform construction
  3. Real-time data warehouse exploration
  4. Future Development and Prospects

view live replay & speech PDF

1. Background introduction

img

Founded in 2014, Douyu is a barrage live sharing platform dedicated to bringing joy to everyone. In Douyu, the development of real-time computing is not too early.

Around 2018, in order to meet some near real-time data requirements, such as 5 minutes, 1 hour and other scenarios, Spark streaming and Storm technologies were introduced successively. As the business continues to develop, the requirements for real-time metrics become more diverse, and Spark streaming and Strom are more and more difficult to support.

Around 2019, Douyu introduced Flink technology, and initially used the development method of Flink jar to support such real-time data requirements. However, the threshold and cost of using Flink jar are still too high.

At the end of 2019 and the beginning of 20, the design and development landed on the K8s-based Flink real-time computing platform, which supports both SQL and JAR job development. Internally, this platform is called "Xuanwu Computing Platform".

img

After the Xuanwu computing platform was launched, it has supported many business scenarios, such as advertising, large screen, recommendation, system monitoring, risk control, data analysis and real-time labeling.

img

By the third quarter of 2021, the number of users of Douyu's real-time computing platform has reached 100+, Vcore has reached 2000+, the number of jobs has reached 500+, and the daily processing data volume has exceeded 100 billion.

img

2. Real-time platform construction

Before building the Xuanwu real-time computing platform, we mainly developed it in the way of Flink jar, which has the following pain points:

  • High threshold for development;
  • High deployment cost;
  • No monitoring alarms;
  • There is no job versioning.

Based on the above four points, we designed and developed our own real-time computing platform.

img

The Xuanwu real-time computing platform is built on the K8s cluster, supports multiple Flink versions, and is a one-stop real-time data development platform. The architecture can be divided into four layers from top to bottom: platform layer, service layer, scheduling layer, and K8s cluster layer.

  • Platform layer: Provides user interaction functions including metadata management, job management, job operation and maintenance, case demonstration, monitoring dashboard, scheduling management, and alarm management.
  • Service layer: Divided into Flink job service and Flink gateway service, providing SQL verification, SQL debugging, job running, job stopping, log query and other capabilities.
  • Scheduling layer: With the help of the K8s container image, the coexistence of multiple versions of Flink is realized. Each Flink version corresponds to a K8s image, so that the job version can be switched at any time. Of course, in order to realize that one SQL is common under multiple Flink versions, we also made a layer of SQL mapping, mainly to solve the configuration differences of connectors between Flink versions. In addition, we also provide a complete job status tracking mechanism within the scheduling layer.
  • K8s cluster layer: mainly provides the basic operating environment.

img

The figure above is an example of job development performed by the real-time computing platform. It can be seen that the entire platform provides the following capabilities: SQL-based job development, online debugging, syntax verification, job multi-version, metadata management, configuration masking, cluster management, parameter tuning, etc.

In the process of building the platform, we also encountered many challenges.

img

The first challenge is the deployment resources of Flink on K8s cluster. In the scheme, we use Standalone Kubernetes to deploy, and actually create two instance groups in the K8s cluster. One instance group is used to run JM processes and the other instance group is used to run TM processes. The two instance groups are bound by setting the same cluster id of HA.

  • When a JM instance group runs multiple pods, except for one of them as the master node, all other pods will run as StandBy;
  • When a TM instance group runs multiple pods, each pod will be registered with JM as a job executor.

In order to fully isolate resources, relying on the capabilities of K8s, during production deployment, we create a Flink cluster as a job. We know that when K8s creates a pod, it needs to specify the CPU and memory settings. When the Flink cluster is started, the resource configuration of JM and TM needs to be specified in the Flink-conf file.

In this solution, the challenge we encounter is how to set up K8s instance resources and Flink cluster resources in a unified manner.

img

In order to solve this problem, we modified the Flink image startup script entrypoint, and added two operations to the script:

  • One is to pull the job definition to get the job's run configuration;
  • The second is to replace the flink-conf file memory size configuration.

Of course, in the latest native kubenates solution, this problem is officially solved by parameterized configuration.

img

The second challenge encountered by the platform is how to monitor the running status of each job. In the scheme, we abstract each job into a message and store it in the message queue developed based on ZK. And 5 states are virtualized in the message queue, Accept, Running, Failed, Cancel and Finish.

Each state has an independent thread pool to monitor consumption. For example, in the Running state, the thread pool obtains a job message from the message queue, parses the Flink cluster information from it, obtains the FlinkUI domain name, and uses the domain name to access the Flink JM Pod through the Nginx Ingress of K8s to obtain the status of the running job. When the job status is obtained or Running, it will be re-queued to the end of the queue, otherwise it will be moved to the corresponding status queue.

img

In the early days of the launch of the real-time computing platform, we encountered new challenges. In a Flink cluster, how to read Hive tables and how to use Hive-Udf functions.

We split a FlinkSQL submission into three parts: job assembly, context initialization, and SQL execution.

For job assembly, we have implemented 2 ways:

  • The first is SDK GET, which requests the service layer of the platform to obtain the job definition through the method encapsulated by the SDK;
  • The second is FILE GET, which directly reads the current machine, and generates the job definition by specifying the SQL file in the path. The second method is mainly to facilitate the rapid debugging of the engine without relying on platform services locally.

The context initialization part is divided into two processes:

  • One is the setting of tuning parameters, similar to the commonly used Set command of HiveSQL;
  • The other is catalog initialization, and the integration of Flink cluster and Hive is implemented in the whole process.

img

Taking Hive as an example, before catalog injection, the platform metadata management module has a catalog initialization process, which stores the catalog creation statement in advance. When a Flink job is submitted, select the catalog to be injected. Create a catalog and register it in the context of Flink to implement element injection of the catalog.

img

With the increase of tasks, for novices, developing Flink jobs on the platform often requires dozens of versions to be rewritten from SQL writing to online. The platform lacks the ability to quickly trial and error. Therefore, we designed and developed real-time monitoring and real-time debugging functions.

In terms of architecture, Douyu introduced the Flink Gateway Server for secondary subassembly of the Flink cluster interface. Contains functions such as syntax verification, SQL submission, SQL status checking, SQL stop, and SQL mock. Collect the logs of the Flink cluster and gateway services in a unified manner. By pre-starting the Flink cluster, the job startup time is shortened and the ability to quickly debug is achieved.

img

Real-time debugging is mainly divided into four steps, namely SQL parsing, rule verification, execution plan, and physical execution.

SQL mock is to rewrite the original SQL parsing process. According to the number of Nodes obtained after SQL parsing, analyze the kinship relationship of SQL to determine the source source table and sink destination table. Dynamically rewrite the Source table as the data source of dataGen, and rewrite the Sink table into the data source of the console.

img

Dynamically modify the configuration of the Source and Sink tables to implement the mocking of the data source. The advantage of this is that online development of SQL can be directly used for debugging without modification, and there is no need to worry about generating dirty data, which can quickly verify whether the SQL logic meets expectations.

img

For monitoring alarms of Flink jobs, use a custom Metrics Reporter to report metrics to the Kafka cluster, and then use Flink tasks to consume metrics information in Kafka, complete operations such as aggregation and supplementary link dimensions, and push the processed data to Push Gateway, written in Prometheus. Finally, the monitoring panel is drawn based on Grafana.

img

Douyu's monitoring market is divided into resource monitoring, stability monitoring, Kafka monitoring and CPU memory monitoring.

3. Real-time data warehouse exploration

img

The first version of the real-time data warehouse solution draws on the layering and development ideas of offline data warehouses, and uses Kafka as the data storage in the middle layer. DB and LOG data are written to Kafka through Canal and Dot Service respectively, as the ODS layer of real-time data.

  1. Consume the ODS layer, use Flink for dimension supplementation and cleaning, and write back to Kafka to generate DWD layer data;
  2. Consume the DWD layer, generate aggregated data in minutes, hours, and specified dimensions, write back to Kafka, and generate data in the DWS layer;
  3. Finally, the data of the DWS layer is consumed and written to data sources such as HBase, MySQL, ES, Redis, ClickHouse, etc. for data service use.

img

With more and more business scenarios, this solution shows four problems:

  • Kafka data retention time is limited;
  • Offline and real-time data storage layers are not unified;
  • It is difficult to directly query and analyze the middle layer;
  • Data backtracking scenarios are not friendly.

img

Based on the appeal problem, we tried the second solution, using Iceberg as the middle-tier storage. Using the Catalog injection mentioned above, we injected Iceberg's metadata, and used Iceberg to store the DWD and DWS layers.

This solution solves part of the problem of using Kafka as a middle layer, but introduces new problems. When Flink writes to the Iceberg table, the visibility of the data depends on the Commit operation of Checkpoint. So the delay of Iceberg data depends on the period of Checkpoint. Checkpoint is a blocking operation, and it is often not recommended to set it too small. That is to say, Iceberg as a middle layer will have higher latency than Kafka. It is not suitable for scenarios with high latency requirements.

img

Finally, we achieve dual-scheme parallelism by customizing the metadata service, maintaining the catalog information of the library table, and dynamically injecting the catalog capability. Of course, we are also continuing to explore more convenient solutions to develop real-time data warehouses.

4. Future Development and Prospects

img

Flink makes real-time computing simpler, and Douyu's process of building a real-time computing platform has not been smooth sailing. For the future development of real-time computing platforms, we have three prospects:

  • The first is Flink's dynamic expansion and contraction, which realizes platform automation, adjusts Flink job resources, and solves problems caused by sudden increase in business data;
  • The second is to simplify the real-time data warehouse development model, lower the threshold for real-time data warehouse development, and truly promote the use of real-time data warehouses on a large scale within the enterprise;
  • The last one is to improve the real-time data quality monitoring system, so that the real-time data quality can be verified and traced.

view live replay & speech PDF

For more technical issues related to Flink, you can scan the code to join the community DingTalk exchange group
Get the latest technical articles and community dynamics for the first time, please pay attention to the public number~

image.png


ApacheFlink
946 声望1.1k 粉丝