Author introduction: Li Wenjie, senior database engineer of NetEase Interactive Entertainment, TUG 2019 and 2020 MVA. Mainly responsible for big data research and development and data analysis, and provide refined operation guidance for products; at the same time, promote the use of TiDB in the department, accumulate experience and explore the optimal solution for business cloud and database distributed, currently the head of the TiDB management team .

This article is organized from TUG NetEase online corporate activities, shared by NetEase Games senior database engineer Li Wenjie, and mainly introduces the practical experience of the distributed database TiDB in NetEase games.

Netease Games first introduced TiDB from the perspective of AP. When using TiDB for the first time, we migrated the computationally intensive task of running batch business to TiDB. During the migration process, if the amount of tasks running is relatively large, I believe many people will encounter the problem of "transaction too large" error.

TiDB transaction limit

After some investigation, it was found that because the distributed transaction needs to be submitted in two stages, and the bottom layer also needs to do Raft replication, if a transaction is very large, the submission process will be very slow, and the following Raft replication process will be stuck. In order to avoid the system being stuck, TiDB imposes restrictions on the size of transactions. Specific restrictions include the number of SQL statements in a single transaction, the number and size of KV key-value pairs, and the size of a single KV key-value pair.

Knowing this limitation, we can find a solution, that is, divides the large transaction into multiple small transactions according to business needs and executes batches, so that the previously failed SQL can run successfully, and in MySQL The batch programs in /Oracle have also been successfully migrated to TiDB.

At the same time, we have to think that when there is no problem, the program can run very smoothly, but when there is a network problem in the computer room or other failures, part of the data will be written to TiDB, and the other part The data is not written. The performance in the scene is that the execution of a transaction cannot guarantee its atomicity, only part of it is executed, some of it succeeds, and some of it fails.

After investigation, it is found that this is because we have manually turned on transaction segmentation. In this case, the atomicity of large transactions cannot be guaranteed. Only the atomicity of small transactions in each batch can be guaranteed. From the overall perspective of the entire task, data appears inconsistent .

So how to solve this problem?

TiDB large transaction optimization

After reporting the problem to the official, TiDB 4.0 has carried out in-depth optimization of large transactions, not only removing some restrictions, but also directly relaxing the limit of single transaction size from 100MB to 10GB, directly optimizing it by 100 times. But at the same time, it also brings another problem. In the t+1 batch running business, there may be hundreds or even tens of millions of data in the previous day. If the program JDBC+TiDB is used for processing, the efficiency is actually not High, the processing time often needs to last for several hours, or even dozens of hours or more.

So how to improve the overall throughput of computing tasks? The answer is TiSpark.

TiSpark: Efficiently handle complex OLAP calculations

TiSpark is a plug-in developed on the basis of Spark, which can efficiently read data from TiKV. At the same time, it supports index search and calculation push-down strategies, and has high query performance. In practice, we found that using TiSpark to read TiKV, 200 million rows of data can be read in 5 minutes. It also has high write performance. With TiSpark, we can directly access TiKV data through Spark tools. Time has proved that TiSpark has very good performance when reading and writing TiKV . With TiSpark, we can handle more complex and large data operations.

TiSpark Practice

In practice, there are two main ways to use TiSpark:

  • Method 1: TiSpark + JDBC write

The TiSpark + JDBC write method can automatically split large transactions, but it does not necessarily guarantee the atomicity and isolation of transactions, and manual intervention is required for fault recovery. The writing speed in this way can reach 1.8 million rows/min. Processing SQL through TiDB and then writing to TiKV is average.

  • Method 2: TiSpark batch write to TiKV

TiSpark batch write to TiKV will not automatically split large transactions. Reading and writing TiKV directly through TiSpark is equivalent to directly reading and writing TiKV data through large transactions, which can ensure the atomicity and isolation of transactions, and at the same time have good write performance, and the write speed can reach 3 million rows/min. Through the application of TiSpark, the problem of our large data batch processing tasks has been solved, but there are also certain hidden dangers. When TiSpark reads and writes TiKV, because TiKV is the storage engine of the entire TiDB architecture, if the data read and write pressure on the storage engine layer is high, it will have a significant impact on other online businesses. In addition, when TiSpark reads and writes TiKV, if there is no restriction on IO, it is likely to cause performance jitter, resulting in increased access latency, and will also affect other online businesses.

How can we achieve effective isolation? Perhaps TiFlash columnar storage engine can provide the answer.

TiFlash:

TiFlash is a supplement to TiKV row storage engine. It is a raft copy of TiKV data. TiFlash, as a column copy based on TiKV, guarantees the consistency and integrity of data synchronization through the raft protocol. In this way, the same piece of data can be stored in two storage engines. TiKV saves row data, and TiFlash saves column data.

When doing Spark calculation and analysis, we can read directly from the TiFlash cluster, and the calculation efficiency will be very high. Using column data for AP analysis is simply a dimensionality reduction blow for row data.

TiFlash: TPC-H performance analysis

The combined application of TiSpark + TiFlash has achieved quantitative and qualitative improvements in computing efficiency. TPC-H performance analysis shows that in horizontal comparison with TiKV, TiFlash execution efficiency is higher than TiKV in almost all query scenarios, and the efficiency of some scenarios is much higher than TiKV. After using TiFlash, not affect the performance of TiKV clusters, nor does it affect offline cluster services, and it can still maintain good performance and throughput when doing offline big data analysis.

Practice has proved that TiFlash can solve many of our problems and is a very good tool.

TiFlash application: calculation is more efficient

In some indicator calculation scenarios of NetEase game user portraits, after using TiSpark + TiFlash, the SQL processing speed of different business content is at least 4 times faster than TiSpark + TiKV. So after using TiFlash, the efficiency of offline batch processing has been qualitatively improved.

JSpark: Cross-source offline computing

With the increase of business scale and application scenarios, different data is distributed and stored in different storage engines. For example, log data is stored in Hive and database data is stored in TiDB. Cross-data source access requires a large amount of data migration, which is time-consuming and labor-intensive. Can different data sources be directly connected to achieve cross-source mutual visits? In response to this problem, NetEase Games adopted the JSpark tool. for opening up different underlying storage and achieving cross-source access goals. The core of this tool is the TiSpark + Spark component. Spark acts as a bridge to access different data sources.

JSpark is packaged based on TiSpark and JDBC. Data can be read and written in TiKV, column AP calculation can be performed in TiFlash, and regular SQL calculation can be done in TiDB. At present, we have encapsulated the mutual read and write function between TiDB and Hive. The JSpark tool will support the read-write and mutual access between TiDB and ES, and achieve multi-source data access for TiDB, Hive, and ES.

The current JSpark tool mainly implements the following functions:

  • It supports TiSpark+JDBC to read and write TiDB and read and write Hive. This method is generally efficient.

    • Application scenario: only operates part of the columns required by the business in the TiDB wide table.
  • Supports reading TiDB table data, and Spark calculation results are written to the Hive target table. For reading, it is recommended to use TiSpark to read TiKV or TiFlash, and for writing, it is recommended to use TiSpark to write to TiKV. The efficiency will be higher.

    • Application scenario: regularly rotates the expired partition TiDB partition table, and the backup permanently retains the copy to Hive to avoid the TiDB table from being too large.
  • Supports reading Hive table data, and Spark calculation results are written to the TiDB target table. It is recommended to use TiSpark to write to TiKV, which has high efficiency.

    • Application scenario: analyzes Hive data to produce user portrait indicators and writes them into online TiDB to provide online business TP query. Another practical scenario is to restore Hive backups to TiDB.
  • Support the front-end web or business to initiate http requests, remotely start Spark jobs, and complete the joint query of the underlying TiDB and Hive.

    • Application scenario: The front-end management platform clicks the query button, a player's Hive link log and TiDB data, and extracts relevant behavior data.

During the development and use of JSpark related functions, we also discovered an optimization point of TiSpark.

Currently, TiSpark does not support simultaneous access to multiple TiDB data sources. Only one TiDB cluster can be registered during runtime, and multiple TiDB clusters cannot be registered. This is not convenient for cross-TiDB cluster computing. In the future, we hope that TiSpark can support simultaneous access to multiple TiDB clusters.

TiDB application: HTAP data system

JSpark is currently the core framework of offline computing. In addition, it also combines with the JFlink real-time computing framework to form big data processing capabilities. JSpark is responsible for the offline big data computing framework, and JFlink is responsible for the real-time computing framework. The two together form the HTAP data system.

HTAP computing capacity: JSpark+JFlink

First, synchronizes and aggregates online data to the TiDB cluster real time, and then relies on JSpark + JFlink to perform offline and real-time calculations in TiDB and other data sources to produce user profile and other indicator analysis data, and feedback online business queries.

TiDB application: HTAP data system

At present, after 3 years of development, the total number of cluster instances of NetEase games is 170, and the data scale reaches 100 + TB. Our business covers user portraits, anti-addiction, operations, reporting, business monitoring, etc., and the scale of business and clusters are also constantly developing and expanding.

The above is the process of NetEase games using TiDB and the evolution of AP computing. I hope that today's sharing can be inspiring for everyone.


PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...