The evolution of ZTO&#39;s big data platform in the big promotion

The annual double eleven is here again, and the best gift for technical people is to promote the technical guide! And after these years of development, the big promotion has long been not limited to the e-commerce industry. Now all walks of life actually use similar methods for operating activities. There are 818 in the automotive industry, 618 in e-commerce, 11.11, etc., etc. The big promotion scenario poses many new challenges to basic software, including databases, and also accumulates many best practices.

Before Double Eleven, PingCAP will conduct a series of in-depth discussions with users such as Autohome, Bitauto, JD.com, and Zhongtong, hoping to reveal what kind of technical problems are hidden behind the soaring sales volume year by year? What technical architecture can be used to stably carry the traffic peak?

Click here watch the full interview and participate in the interaction, and have a chance to get TiDB customized peripherals!

In the big promotion, the most anticipated thing after buying is receiving the express. Established in 2002, Zhongtong Express is a comprehensive logistics service brand with express delivery as the main body and supplemented by international, express, cloud warehouse, commerce, cold chain, finance, intelligence, Starlink, and media. In 2020, Zhongtong will complete 17 billion pieces of business, and its market share will reach 20.4%.

The entire life cycle and transit cycle of express delivery can be summarized in five words- receiving, sending, arriving, dispatching, signing :
在这里插入图片描述

and the platform that supports the entire express life cycle is the Zhongtong big data platform . From offline to real-time data compatibility to data warehouse, ZTO has a relatively complete big data platform system. ETL modeling will also rely on the big data platform, and finally provide external data application support through the big data platform and support based on offline OLAP analysis. The frequency of the entire data modeling can support up to half an hour. On the basis of this complete big data platform, Zhongtong began to think more about can enhance the real-time multi-dimensional analysis capabilities .

在这里插入图片描述

The ties between Zhongtong and TiDB began when they investigated the sub-database and sub-table scenarios in 2017. At that time, Zhongtong sub-database expressed 16,000 tables, and the business was no longer able to continue to expand. At the end of 2018, ZTO began testing TiDB 2.0, focusing on the storage of large amounts of data and analysis performance. At the beginning of 2019, ZTO launched production application support. The stable version currently in production is TiDB 3.0.14. At the end of 2020, ZTO started to test TiFlash, with two goals: One is to improve timeliness, and the other is to reduce hardware usage .

1.0 era-meeting demand

1.0 is meets the needs of . The business requirements mainly include the following points:

Business development is very fast, the amount of data is very large, each order is updated 5-6 times, and the operation has a peak;
Technical solutions that have been investigated are difficult to support the needs of multi-dimensional analysis;
The business side requires a relatively long period of data analysis;
The analysis timeliness requirements are also very high;
Single-machine performance bottlenecks, including single points of failure and high risks, are also unbearable in the business;
In addition, QPS is also very high, and applications require millisecond response.

In terms of technical requirements, Zhongtong needs to open up multiple business scenarios + multiple business indicators; it needs strongly consistent distributed transaction , and the cost of switching under the original business model is small; it also needs to engineer the entire analysis and calculation , Offline the original stored procedure; can support high-concurrent reading and writing, update ; can support online maintenance , to ensure that a single point of failure has no impact on the business; at the same time, and the existing big data technology The ecology is closely integrated together to achieve minute-level statistical analysis; the last thing that Zhongtong has been exploring is to build a large wide table with more than 100 columns. Based on this wide table, multi-dimensional query analysis should be done .
在这里插入图片描述
At present, some landing scenarios of TiDB application in Zhongtong

Application Scenarios of Aging System

Among them, the aging system is the original system of Zhongtong, which has now been reconstructed. The original storage and calculation of this system were mainly designed by Oracle, and the calculation relied on stored procedures. This set of architecture is also relatively simple, one side is the access of the message, and the other side is the load.
在这里插入图片描述

With the growth of business volume, the performance of this set of architectures has gradually become bottlenecked. When upgrading the architecture of this system, Zhongtong migrated the entire storage to TiDB and the entire calculation to TiSpark. Message access depends on Spark Link, and finally reaches TiDB through the message queue. TiSpark will provide some calculations at the minute level. The light summary will go to Hive, and the medium summary will go to MySQL. Based on Hive, it provides external application services through Presto. , whether it is OLTP or OLAP, greatly reduces the workload of development, and is integrated with the existing big data ecological technology stack.

在这里插入图片描述
1.0 The database system architecture of Times Zhongtong

Migration brings many benefits: first, the capacity increase, the original data center has three times the surplus, and the data storage cycle of the existing system is increased to more than three times; second, in terms of scalability, it supports online horizontal expansion. Operation and maintenance can go up and down computing and storage nodes at any time, and the perception of applications is very small; third, it meets high-performance OLTP business requirements, and query performance is slightly reduced, but it meets business needs; fourth, the single point of database pressure is gone , OLTP and OLAP achieve "separation" and do not interfere with each other; fifth, support for more dimensional analysis requirements; sixth, the overall architecture looks clearer than the original, maintainability is enhanced, and the scalability of the system is also enhanced many.

Large wide table application scenarios

Another scene is the construction and exploration of the wide watch that Zhongtong has been doing. In fact, Zhongtong has tested many systems before, including Hbase and Kudu. Kudu's writing performance is still very good, but its community activity is average in China. At the same time, ZTO uses impala as the OLAP query engine, but the mainstream uses Presto, compatibility needs to be considered, and it is difficult to meet the needs of all business scenarios. In addition, the business characteristics of Zhongtong require that the system can quickly calculate and analyze billions of data, and be able to synchronize to offline clusters for integration with T+1 data, and be able to provide data products and data services for direct connection and pull Detailed data. The last is the processing of massive data. Zhongtong has access to many message sources. It needs to make full link routing and timeliness predictions for each ticket, and locate the transfer link of each ticket. The amount of data is very large, and the timeliness requirements are very large. It's also very high.

在这里插入图片描述
Zhongtong's large wide table construction

Currently, the wide table has more than 150 fields. The data comes from more than 10 topics. The main project access is through Flink and Spark, which connects the data generated by each business and aggregates it into TiDB to form a business wide table. The extra part relies on TiSpark to output analysis results from the business wide table and synchronize 300 million pieces of data to Hive. In addition, it also provides ten-minute-level real-time data construction and offline T+1 integration.

In the process of using the current cluster scale of caused the qualitative change of 161a070e5507ab. First, hot issues. Index hotspots are more prominent in the current situation, because ZTO’s business volume is very large, and there are peaks in operations. This hotspot problem is particularly obvious at large times. Second, the problem of memory fragmentation. In the previous low version, after a period of stable operation, due to business features and a large number of updates and deletions, the memory fragmentation was serious. This problem has been fixed after being fed back to TiDB. Third, focus on one parameter-TiFlash reads the index parameter. Through the test, when the read data volume/total data volume is greater than 1/10, it is recommended that this parameter be turned off. Why do you say that? Because the number of tests may decrease, but the transition time of unit tests will increase.

Operation and maintenance monitoring

uses TiDB, you will find that its monitoring indicators are particularly rich. It uses the popular Prometheus + Grafana, and many and all . Before, ZTO, because while supporting online business, there were also developers to check data, and encountered a situation where SQL suspended TiKV Server. In response to this problem and the monitoring problem, Zhongtong has made some development and customization. First, the slow SQL compatible with online special accounts will be automatically killed and the corresponding application person in charge will be notified. Second, ZTO has developed a tool that supports Spark SQL to query TiDB, and concurrency and security are guaranteed during the development process. In addition, ZTO will integrate some additional core indicators into its self-developed monitoring system. The core alarm will be notified by telephone to the relevant staff on duty.
在这里插入图片描述

During the Double Eleven last year, ZTO’s order volume exceeded 820 million, and the entire business scale exceeded 760 million. The peak QPS on Double Eleven reached 350,000+. Throughout the Double Eleven period, the volume of data updates reached hundreds of billions. There were more than 100 TiSpark tasks running on the entire cluster, and 7 online applications were supported. entire analysis of 161a070e55083c reached 98% in less than 10 minutes, and the data cycle of the entire analysis reached 7-15 days .

2.0 era-HTAP upgrade

The main feature of the 2.0 era is the improvement . The application of HTAP in Zhongtong mainly comes from the upgrade of business needs:
Based on the needs of the business side, ZTO has carried out an architecture upgrade in the 2.0 era. First, introduced TiFlash and TiCDC . The benefits of this are actually enhanced timeliness, and some analysis has entered the minute level, which reduces the use of Spark cluster resources.
在这里插入图片描述
Data system architecture of ZTO in the 2.0 era

The figure below is a comparison between TiSpark and TiFlash. There are two clusters on ZTO, one based on 3.0 and one based on 5.0. Simply compare the situation of 3.0 and 5.0: 3.0 is based on TiSpark, and 5.0 is based on TiFlash. Currently the 3.0 cluster has 137 physical nodes and 5.0 has 97 nodes. In the entire operating cycle, 3.0 is 5-15 minutes, and TiFlash based on 5.0 has achieved 1-2 minutes, and the load reduction of the entire TiKV is relatively obvious. In addition, on 3.0, there are about 60 Spark resources, and on 5.0, with the online addition of the test, about 10 is enough.
在这里插入图片描述

In the entire test cycle, the production cluster is 3.0, and the 4.0 test cycle is actually very short. During the test, the business scenario has some dimension table Join situations. At that time, 4.0 did not support MPP, and the support for some functions may not be so complete, and the test results were not very satisfactory. The test of HTAP is mainly in the 5.0 stage. 5.0 already supports MPP, and the support for functions is becoming more and more abundant. At present, the version used in the production of Zhongtong is TiDB 5.1.
在这里插入图片描述

The right side of the above figure is the load situation of the entire 5.0 cluster during the 618 period. In the just-concluded 618, some tasks of the 5.0 launch are already supporting the big promotion kanban of the 618 mobile terminal. Zhongtong has 6 core indicators that are calculated based on TiFlash. The overall response of the cluster is stable, and the report has reached the timeliness within minutes. The overall data volume is 4 billion-5 billion+, and the report analysis data reaches 1 billion+.

3.0 era-looking to the future

在这里插入图片描述

first is to monitor . When it comes to monitoring, because ZTO's cluster is relatively large, there may be more problems and problems encountered. Large clusters have many instances, slow indicator loading, and the efficiency of troubleshooting cannot be guaranteed. Although the monitoring is very comprehensive, it is impossible to quickly locate the problem when something goes wrong;
The second is to solve the problem that the execution plan occasionally inaccurate. This occasional inaccuracies sometimes affects the mutual influence of some online loads, pulling up the index of the cluster, and leading to mutual influence of the business.
The third is to realize automatic cleaning . At present, ZTO data is cleaned up by writing SQL by itself, but it is troublesome to clean up expired data. I hope to support automatic TTL for old data in the future.
Fourth, with column storage , ZTO plans to gradually cut all TiSpark tasks to TiFlash, hoping to achieve the goal of improving timeliness and reducing hardware costs.

For an enterprise, in addition to supporting business innovation, it is also a large-scale training and full-link exercise of its own technical architecture. Through the extreme test of the big promotion, the company's IT structure, organizational process, and talent skills have all been greatly improved. The experience and thinking in the big promotion will also accelerate the company's daily business innovation rhythm, improve technology-driven innovation efficiency, and create a new growth engine.

The evolution of ZTO's big data platform in the big promotion