Application Practice of Apache Kyuubi in Xiaomi Big Data Platform

Introduction: The topic shared today is "The Application Practice of Kyuubi in Xiaomi's Big Data Platform", which is mainly divided into four parts:

The landing process of Kyuubi in Xiaomi
Create an easy-to-use and highly available Kyuubi service
Improvements based on kyuubi
Application of some new features of kyuubi in business scenarios

01 Kyuubi's landing process in Xiaomi

The first topic: Sharing about the landing process and implementation path of Kyuubi on Xiaomi's big data platform.

Background introduction

Let me introduce the background first. Xiaomi's big data system is constantly updated and iterative. With the adjustment of business structure, organizational structure and technical structure, some situations have gradually emerged in the internal big data platform:

A number of SQL-based big data platform services have emerged, serving various business departments, and their positioning has certain differences, which brings confusion to users. Which platform is better to choose, and we found in the process of user support , the same business may need to cross multiple data service platforms, and the process is cumbersome.
There are multiple account and permission systems for the use of underlying table resources:
a. MySQL/Doris: The system's own User&Password authentication and authority system
b. Hive/Kudu permission system based on Kerberos authentication and Sentry
c. Talos is a certification and authorization system based on Xiaomi's internal platform organization and team
It brings trouble to users' use and management. There is no unified resource management and authority management perspective, and the underlying system service accounts will be directly exposed to users, and there will be security risks.

Build a one-stop big data development platform

The above phenomenon directly leads to the following problems:
①To the user:

Multiple platforms and multiple systems give users a poor experience, and the development data process is long, making it impossible to get started quickly.
Development management efficiency is high, and resource cost settlement and task management do not have a unified view.

②For the platform:

Their respective focuses are different, and none of them can fully cover the capability requirements in big data scenarios. At the same time, there is also the problem of repeated capability construction, which leads to waste of resources.
Troubleshooting and maintenance are difficult and require a lot of manpower to solve.

Faced with the difficulty of using the data platform, the overall goal of building a unified and easy-to-use big data service platform is proposed. The overall architecture capability is built around data link solutions, data warehouse solutions, and data service solutions, providing a unified metadata management and authority management system.
Under this background and motivation, the unified data entry service has become a very important capability, which mainly solves:

Ease of use for users (consistent entry experience)
SQL Traffic Governance (Proxy Multi-Engine)
Security control of data access (ingress convergence and security risk reduction)

Historical development of Xiaomi SQL service

As can be seen from the above background questions, Xiaomi has several sets of SQL service entrances for big data processing, which are generally built around the classic SQL On Hadoop architecture, and gradually evolve from ThriftServer to an upwardly abstracted SQL Proxy service. Engines such as Hive/Spark/Doris are integrated at the bottom layer to provide support for ETL jobs and Ad-Hoc queries.

The abstracted SparkThriftServer implementation module acts as an independent SQL Proxy service, providing:

HiveServer and Spark APP proxy in ETL scenario (non-resident)
STS, Kylin, Druid agents in Ad-Hoc scenarios

It can be seen from here that the positioning of SQL Proxy and Kyuubi Server is very similar, but there are many shortcomings:
a. SQL Proxy does not completely strip the implementation of STS, it is reused by reflection, the code coupling is very high, it depends on a specific version of Spark, and it is difficult to upgrade
b. The underlying engine proxy layer does not have a unified abstraction, it is difficult to adapt to other engines, and the scalability to the underlying engine is poor
c. It is impossible to debug locally and relies on hadoop configuration. In the case of network isolation in the office and service environments, complete functional testing and debugging must be completed on the development machine. The development and deployment paths are long.

Building a unified SQL portal based on Kyuubi

(1) Why choose Kyuubi

Through the above analysis, we found that there are some problems in business and architecture that need to be solved.
① In business:

Driven by the re-creation of a unified big data system, it is imperative to build a unified SQL entry service.
Need a faster analysis engine, here we choose Trino.
A set of easy-to-use, highly available and continuously evolving service architecture to improve the productivity of big data R&D.

SQLProxy schema needs to be upgraded:

Fully compatible with HiveThrift protocol.
A loosely coupled implementation, based on a complete stripping of the STS implementation.
Flexible and scalable proxy multi-engine adaptation.

The advantages of Kyuubi are:

Fully compatible with STS and HS2
High availability and resource isolation
Clear and concise architecture, testable, maintainable, and extensible
High-quality implementation in the community and extensive use in the industry's production environment

The architecture of SQLProxy and Kyuubi are very similar, and the switching cost is low. Driven by business needs and architectural upgrades, we chose Kyuubi.

(2) Architecture upgrade

The upgrade process and effect are in line with our expectations. It can be seen that the architecture is more concise than SQLProxy, it is very easy to extend the underlying engine, and it can be tested and debugged locally, which greatly improves the development efficiency. From the development to the launch of the new architecture, the smooth migration was completed in two weeks.

The effect of upgrading the new architecture is also very obvious. Compared with the previous architecture, the code quality, service stability, maintainability and scalability have been greatly improved:

Multi-engine proxy capability (mainly supports Spark/Trino/Hive/Doris).
The system based on the data platform workspace realizes permission verification and resource isolation on the Kyuubi Server side.
More standardized Hive Thrift API support, perfectly compatible with various ecological visualization tools (Redash/Datagrip, etc.).

(3) Current status of unified SQL service

After half a year of migration, the daily effective SQL processing capacity has increased from 5W to the current 50W scale, accounting for 80% of the entire SQL traffic. In particular, the traffic of SparkSQL increased to 30W in half a year. General flow distribution: Spark 36w / Trino 12w / Hive 2.5w

The time required for each engine request:

Spark and Trino are the same, the average delay is about 30 seconds, and the P50 is about 5 seconds
The execution efficiency of Hive is obviously lower than the above two engines, which is related to the large task of Hive. The ETL is too high. At present, Kyuubi Server carries real SQL traffic of about 100w per day, and the availability can still reach more than 99.9%, which is very stable.

02 Create an easy-to-use, easy-to-maintain, and highly available Kyuubi service

Build Kyuubi that matches your business needs
(1) Overall Architecture

The overall architecture and process are mainly divided into entry services, authentication and permission adaptation, underlying engine management and service observability:
Kyuubi Server builds a unified SQL portal service based on
Kyuubi Engine as Spark SQL execution engine layer
Independent Engine Manager service manages various computing engines
The Kyuubi Server layer integrates the Ranger service, supports unified authorization verification based on the data platform, and adapts to the visualization of Trino/Hive/Doris engine service indicators and audit logs

(2) User interaction

The isolation of computing resources and the security of storage resources (tables) are guaranteed with workspace granularity. Similar to Kyuubi Group's multi-tenancy, we have extended to other engines here.

Complete the interactive process in one go:
Users under WorkspaceA use the Token issued by the platform, select various client tools, and submit SQL queries to the engine. Kyuubi Server will automatically submit the user's SQL to the computing engine to which the space belongs to ensure the isolation of user resources.
Although it is the same entry as other workspace users, the use of resources is isolated. The Kyuubi Server service does not specifically execute SQL, and the same entry service will not have too much pressure.

Improve the usability on the user side

(1) Unified authentication and unified de-Kerberosization of table coordinates, using the platform unified Token method to solve:

Kerberos access process is cumbersome
It is difficult for ordinary users to understand the kerberos mechanism, and it is difficult to troubleshoot problems
Improper user management, user expansion problem under the same account
Auditing and tracking cannot precisely locate the unified standardization of user personal table resource naming. There are multiple regions and multiple types of data sources within Xiaomi. If a unified SQL entry service is used, the table name specification of SQL statements needs to be unified to avoid conflicts and unified management. :
Use Catalog.Schema.Table three-level table name
The Kyuubi Server supports JDBC URL preset Catalog/Schema for the unique table name, which is compatible with the second-level or first-level table names in the previous SQL
Combine URL and SQL Table to generate complete three-level table coordinates for user authorization authentication
(2) Kyuubi Engine public resource pool

The introduction of the Kyuubi Engine public pool mainly solves the query performance problem when users enter the space for the first time to submit SparkSQL. According to the SQL analysis statistics submitted by the users mentioned above, 50% of the SQL query delays are less than 5 seconds. In the absence of resources allocated in advance, a user submitting a query will cold start a Kyuubi Engine, which is the current mechanism of Kyuubi. Since the delay for Xiaomi Yarn to submit an APP is at the minute level, a simple second-level query by the user will be delayed to the minute level, and the somatosensory feeling is very poor.
Therefore, with the implementation of Kyuubi Engine Pool, for workspace users without pre-configured and specified resources, SQL will be routed to the pre-started Kyuubi Engine Pool to speed up user query speed and improve SQL query experience.

Upgrade Spark2.X to Kyuubi Engine

Kyuubi Engine currently only supports Spark3 and above. Previously, our internal version was Spark2. Before upgrading to Kyuubi Engine, we did relevant comparison tests. Under the Kyuubi architecture and the SQLProxy architecture, there are obvious performance improvements:

On the TPC-DS standard test set, the P50 latency has a 75% performance improvement, and the performance of the long tail is basically the same as that of SQLProxy.
In the real business scenario, the P50 latency also has a 37% performance improvement, and the long tail is basically the same as that of SQLProxy, that is, the performance of the upgraded Kyuubi Engine is better than Spark2 in most cases, and overall it will not be worse than Spark2 .

Containerization of Kyuubi Server

![picture]
( https://static001.geekbang.org/infoq/24/24ee13996fa76ab26a5d1ad4e257fb8e.png )

In the high availability of Kyuubi Server, the containerization method is used to replace the current high availability mode of service discovery through ZK on the Kyuubi client side:

Deploy the Kyuubi Server service on K8s to make full use of the elasticity of K8s to ensure high availability.
The deployment of Kyuubi Server and Kyuubi Engine is completely decoupled. As a separate Thrift RPC proxy service and HTTP service, it removes Hadoop-related configuration environment dependencies, and uses LVS for traffic load balancing like ordinary business services.
At the same time, with the help of the CI/CD capability of the internal K8s platform, the fully automatic grayscale release of the Kyuubi Server service is realized, and it supports one-click upgrade and capacity expansion.
Workspace-based computing resource management (1) Engine Manager

Since the management service for Spark Engine has been implemented before, we directly separated the management of Kyuubi Engine from Kyuubi Server to form a separate Engine Manager service, which is responsible for the life cycle management of Engine, configuration context management, and provides service discovery and load balancing. ability.

Provides engine configuration and lifecycle management for management portals.
Provides the ability to provide SQL routing to Kyuubi Server.
Provides visual monitoring capabilities for operation and maintenance, including engine service status, resource occupancy, and busyness, which facilitates rapid operation and maintenance.

The process of the SQL submitted by the user:

First go through the authentication and authority verification of the Kyuubi Server entrance.
Kyuubi Engine address available to EngineManager by Kyuubi Server.
EngineManager obtains the available Engines in the current user space from ZK, then counts the busy indicators of the currently available Engines, and returns the relatively idle Engines to Kyuubi Server.
Kyuubi Server submits SQL to the Engine suggested by EngineManager for execution.

(2) User Submission

The picture shows the SQL query entry of our user platform. Users under the workspace can easily start a Kyuubi Engine. In order to lower the threshold for users, only the configuration of resource-related and queuing policies is exposed. At the same time, users can also configure multiple Kyuubi Engine instances to ensure the high availability of SQL execution in the current workspace.

(3) High availability of Engine

Why do you need high availability of Kyuubi Engine? Because in the actual environment, the Kyuubi Engine has been running for a long time, the SQL execution process of Spark is very complicated, and its stability will be problematic over time:

A bug in which events are lost after the dynamic resource policy is enabled, resulting in the inability to release resources.
Large tasks take a long time and may block the running of some small tasks.
Driver side JVM Full GC time is too long and OOM.
Engine restarts frequently due to unreasonable SQL.

Therefore, some high-availability guarantee strategies are implemented:

Workspace level isolates Engine exceptions to avoid affecting other users.
Observe the available indicators of the Engine, and mark whether it is currently available through busy and active detection information.
Multiple Engine instances in the same workspace (Kyuubi's Engine Pool mechanism) improve overall availability and provide load-based distribution.
It automatically restarts when an abnormality is found.
Frequently restart the Engine through the alarm mechanism, and manual intervention in a timely manner.

03 Retrofit based on Kyuubi

Proxy for Trino and Doris

The introduction of Trino and Doris mainly solves the problem of query efficiency in OLAP scenarios.

Kyuubi does not yet support Trino in version 1.1.0. We have used Trino-JDBC on the kyuubi Server to complete the adaptation of the Trino engine.
Trino-JDBC implements the stream iterator pattern, and each nextResult triggers a request to the Trino engine.
The current community Trino-Client implementation will pull all result sets at one time, which may lead to the risk of OOM.

The JDBC method is also used for the adaptation of Doris. Since the Doris client itself supports Mysql JDBC, the implementation of MySQL JDBC is the full pull mode, and the Kyuubi Server side has the risk of OOM. The risk of OOM caused by large result sets is currently reduced by limiting the timeout period for Doris queries.
If you want to extend the database support of Kyuubi proxy to other JDBC later, you must be careful.

Support of SQL HTTP API Regarding the support of HTTP API, a total of V1 and V2 versions are implemented, and there are still some differences compared to the community.

① V1 version

It simplifies the user interaction process and the calling process of Hive Thrift RPC. Users can submit SQL directly through HTTP requests in the upper-layer application, which is very friendly to some R&D users. Submit SQL According to QueryID, continuously poll for results.
The implementation of Thrift backend Service is reused, and a layer of HTTP Fronted Service is horizontally extended. The underlying implementation is completely consistent with the Thrift API.

But there are also some problems:

The Kyuubi Service side has a Session state. Step1 and Step2 must be routed to the same instance to get the result, which cannot be completely solved by using IP Hash.
This also makes the Kyuubi Server HTTP service unable to scale horizontally and upgrade smoothly.

②V2 version

In order to completely solve the horizontal scalability problem of V1, Kyuubi HTTP Server is completely stateless in V2 version, and HTTP SQL API is provided directly through Kyuubi Engine. Kyuubi Server only acts as an outbound agent.

Two other improvements:

Completely solve the problem of Kyuubi Engine OOM caused by large result sets, and directly persist the results of query classes to HDFS without going through the Spark Driver side.
When users obtain results, they stream the result set directly from the HDFS layer without going through the Kyuubi Engine.

At the same time, there is no need to maintain long links, which is very suitable for ETL scenarios.

SQL table column parsing

We have performed permission authentication on the Kyuubi Server side. We need to obtain the real table name of the user's SQL. We have developed a pure SQL parsing module: it supports the extraction of table column blood relationship and SQL type, and supports SparkSQL and Trino syntax.
The specific parsed format is shown in the figure, including types, input and output tables, and columns of queues.
Later, in specific practical scenarios, this module is also applied to its business scenarios, such as table blood relationship audit logs, SQL statistical request analysis and other security quality scenarios, completely reusing our SQL table column extraction capabilities.

04 Application of new features of Kyuubi

Small file merge

Solve the problem of too many small files that may be caused by user writing scenes. Users generally submit two SQLs: one is business processing SQL, and the other is merged SQL, which are connected in series through workflow and maintained unchanged.
It is also very simple to turn on. You can turn on the switch during the Kyuubi Engine startup phase and the SQL submission phase.

Incremental fetch and fetch result set limits

It is mainly the OOM problem caused by the query of the user with the result set under JDBC, and the incremental mode is enabled. However, in some scenarios, the result of some partitions may be too large, which will block the result retrieval process, resulting in a bad user experience. It is recommended to use the HTTP API asynchronous result acquisition method to solve the problem.
For some SQL preview data for users, if the accessed table is very large, it is a very good function to limit the number of query output to avoid unnecessary overhead.
Z-Ordering

In our internal portrait scene related tests, Z-Ordering has been significantly improved.

business inquiry time
storage
Query the amount of scanned data
number of files

In specific applications, the sorting rules of Z-Ordering need to be adjusted according to the data of the actual business table:

We sort the columns with high query frequency in the portrait scene, and the effect is obvious
Optimization after more than 3 columns is not ideal
Sort columns should choose columns with large cardinality and no skew

The implementation of Kyuubi Engine Z-Ordering is very clever, no additional columns are added, and the native capabilities of parquet are directly reused, so one generation can support multiple engine queries (as long as the engine supports reading in parequet format).

PlanOnly mode

Mainly used for SQL-related scenarios of non-SQL execution, such as:

Provide syntax and semantic verification service SQL for data platform
Check SQL before commit
Syntax and semantic compatibility check (Spark2.X->Spark3.X upgrade)

SQL will not be executed in PlanOnly mode, only the parsed LogicalPlan/SparkPlan will be output. At present, a syntax and semantic verification service is provided separately for the data platform, which is the PlanOnly mode of Kyuubi Engine.

This application scenario also provides us with a new idea: use Kyuubi Engine as the service framework of Yarn APP to provide services in other scenarios, such as verification service, blood relationship extraction service, and SQL precomputing service.

Scala mode

Scala Code mode completely liberates the ability of Kyuubi Engine, has the ability to submit Scala code directly through JDBC, and specializes in some complex logic business.

At present, we have made some attempts in the operation and maintenance of our application scenarios, mainly to solve our operation and maintenance efficiency. For example, we want to dynamically load user-defined jar packages at runtime and read Thrift-formatted data. Compared with the previous process of logging in to the production cluster machine to package and run the code, it is greatly simplified.

05 Future planning and summary

planning:

Based on business scenarios, SQL rules, and execution costs, it can realize automatic routing capabilities under multiple engines.
The HTTP API replaces the ETL jobs submitted by the Thrift API, and the asynchronous method replaces the long connection method.

Summarize:

Kyuubi is an excellent open source practice and has become an important infrastructure service for Xiaomi's internal big data service portal.
Thank you very much for the contribution of the Kyuubi community, which accelerated the landing of our unified SQL service.
I believe that in the future, Kyuubi will become the SQL Gateway benchmark in the big data scenario, and we will build the Kyuubi ecosystem together with everyone.

That's all for today's sharing, thank you all.

share guest

Application Practice of Apache Kyuubi in Xiaomi Big Data Platform

01 Kyuubi's landing process in Xiaomi

02 Create an easy-to-use, easy-to-maintain, and highly available Kyuubi service

03 Retrofit based on Kyuubi

04 Application of new features of Kyuubi

05 Future planning and summary

网易数帆

引用和评论

一图看懂网易数帆指标平台EasyMetrics

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性