Application Practice and Evolution of TiDB in Eggplant Technology

Background introduction-company introduction

Eggplant Technology (Overseas SHAREit Group) is a global Internet technology company , mainly engaged in mobile Internet software research and development and global mobile advertising monetization solutions, cross-border payment solutions and other Internet services . SHAREit is a representative product of Eggplant Technology. It is a one-stop digital entertainment content and cross-platform resource sharing platform , with a total of nearly 2.4 billion installed users. As an overseas company, Eggplant Technology has built a variety of tools and content applications in Southeast Asia, South Asia, the Middle East, Africa and other regions, and has consistently ranked among the best on the Google Play download list.

Background introduction-business characteristics and selection

Eggplant Technology has a large product matrix, and the product form is relatively complex, including tools, content, games, advertising, payment etc. For relatively complex business scenarios, we have made different database selections according to different business forms. Currently, the six most important databases used by Eggplant Technology include:

self-developed persistent KV : feature platform, user portrait, behavior record, etc.
Redis Cluster : service cache, session information, etc.
Cassandra : content library
MySQL : Hue, Metadata, operating platform, etc.
ClickHouse : data analysis, real-time reports
TiDB : user growth, APM system, cloud billing, etc.

TiDB application practice-business pain points and TiDB advantages

Based on business-level pain points, we have introduced TiDB in multiple business scenarios:
First, as an overseas company, Eggplant Technology has introduced multiple public clouds and as infrastructure. Therefore, at the database level, 1614afead66ea8 business adaptation, data migration, database compatibility and data synchronization The problem.
Second, there are a variety of high-traffic eggplant APP product, service presentation growth trend, the traditional DRS database, such as MySQL, because of the need sub-library sub-table, hinder the rapid development of the business.
Third, NoSQL databases such as Cassandra and HBase cannot meet the complex scenarios of distributed transactions and multi-table join

Some of the APM and other systems of Eggplant Technology are HTAP scenarios. The same business data has both OLTP and OLAP requirements. We hope that a set of databases can be handled.

After the introduction of TiDB, TiDB has exerted its unique advantages in many aspects to help Eggplant Technology to create a sustainable database ecology:

Use TiDB's cross-cluster migration and data synchronization capabilities to create business expansion capabilities under the multi-cloud architecture to meet the business architecture design under the multi-cloud architecture.
TiDB provides automatic horizontal elastic expansion capabilities, so that business unawareness, to solve the problem of sub-database and sub-table.
TiDB is highly compatible with MySQL, with low learning costs and low migration costs in large-capacity and high-concurrency scenarios.
Utilize the capabilities of TiDB HTAP to meet the dual requirements of OLTP and OLAP for business on a piece of data.
TiDB application practice-APM scenario application
Eggplant Technology's APM (Application Performance Management) monitoring, analysis, kanban, and repair integrated capabilities for APP crashes, performance and other issues, to support a variety of high-growth APP applications. The first feature of this system is informative , ten billion generated data every day, the need for 30 days; second feature is timeliness high demand for some of the more difficult cases, such as the collapse of As well as serious performance problems, if the timeliness cannot be satisfied, it will directly affect the user experience and even the product revenue; the third feature is that the capability of problem tracking and repair; The fourth feature is that needs to take into account the OLAP analysis scenario based on the OLTP transaction scenario.

First analyze the data flow of early APM, from APP data reporting to log collection, and finally to ClickHouse. The entire data flow is a batch processing , which takes more than two hours. The overall timeliness is weak and the problem is not exposed. Timely, it will have an impact on the user experience. In addition, there are two sets of databases, MySQL and ClickHouse, in this system. Why is it designed this way? Because ClickHouse can be used to do data analysis and aggregation , MySQL is mainly used to create process work order , and there are two sets of databases to support at the same time, the cost is relatively high.

Looking at the data flow of the new version of APM after the introduction of TiDB, you can see from the APP report, to the display of the kanban, to the alarm, and then to the process ticket, achieving minute-level quasi-real-time viewing and alarming. This part mainly uses TiDB's HTAP capability to display it to the viewing version through aggregation analysis, and give a timely alarm to the alarm center. At the same time, the OLTP capability of TiDB is used to update the kanban lines. Therefore, we can use a set of TiDB database to open up the version viewing, monitoring, problem tracking and repair process.

Evolution based on TiKV-self-developed distributed KV

Everyone knows that TiKV is TiDB's storage layer , and it is also a Key-Value database. Next, let's talk about the history of Eggplant Technology to build a distributed KV system based on TiKV. Eggplant Technology mainly provides tools and content products, and the amount of data generated is very large. KV storage needs to support two types of scenarios: one is the real-time generation , and real-time writing; the other is for user portraits and feature engines , which quickly loads large quantities of data generated offline to online KV storage, provides fast access to online businesses, that is, Bulk Load capability. In actual business, TB-level throughput per hour .

The picture below is a distributed KV based on Rocksdb's self-developed by Eggplant Technology. This system meets the above two types of KV requirements at the same time. The architecture shown on the left of the figure is mainly the real-time write capability . The data first goes from SDK to network protocol layer, then to topology layer, then to data structure mapping layer, and finally to Rocksdb. On the right is the process of Bulk Load bulk import. You may have a question, why the real-time writing process on the left cannot satisfy hourly terabyte data import? There are two main reasons: one is Rocksdb write amplification, especially in large key scenarios, Rocksdb write amplification is very serious. Another point is that it is limited by the network bandwidth of a single disk, which results in a limited stand-alone load or stand-alone storage. How does the batch import capability of on the right? It uses Spark to perform data analysis, pre-sharding and SST generation on Parquet, upload SST to the storage node of Rocksdb, and finally load it to the KV layer uniformly through ingest & compact for online business access. The single-machine throughput can be achieved per second. Reach 100 trillion.

Evolution based on TiKV-distributed KV based on TiKV

Since Eggplant Technology has self-developed distributed KV based on Rocksdb, why use TiKV? First of all, at the technical level, although the self-developed distributed KV has been running for more than two years in production and has supported hundreds of terabytes of data, there are some technical problems, such as automatic elastic scaling, strong consistency, transactions and large keys and other support needs to be further invested in research and development. Second, talent level database for high-quality talent pool there is a certain lack. After many investigations and communication with TiKV R&D students, we found that our needs and pain points coincide with TiKV's product planning, which prompted us to actively embrace TiKV. With TiKV, we can technically create KV products that separate storage and computing. Third, TiKV has an active open source community, and we can use the power of the community to polish products together.

The architecture in the figure below is a distributed KV built by Eggplant Technology based on TiKV. The left part is mainly to solve a process of real-time data writing, from SDK to network storage, to data calculation, and finally to the storage engine of TiKV. Our focus is on the research and entire Bulk Load capability on the right part of 1614afead67137. Unlike the self-developed distributed KV, the entire SST generation process is placed inside TiKV. The reason for this is that can be maximized. Reduce the cost of code development and maintenance of the Spark part, and improve the ease of use .

Based on the evolution of TiKV-test conclusion

The following two tables are the actual test results based on the Bulk Load capability of TiKV. The above table is in the case of E5 CPU, 40 vcores, and NVMe for disks, the maximum single-machine throughput 256 MB The following table shows the pressure test of the online reading part while performing Bulk Load. It can be seen that the jitter of the Latency response time is very small. Whether it is P99 or P99.99, it is in a relatively stable status. This test result is a demo verification. I believe that after our optimization, whether it is storage throughput or response delay, there will be a qualitative improvement.

The ability of Bulk Load is the result of our collaborative development and co-evolution with TiKV R&D classmates. We believe in the power of openness. Later we will publish the entire architecture, including test data, on GitHub. If you have corresponding needs, you can pay attention to it.

Application Practice and Evolution of TiDB in Eggplant Technology

Background introduction-company introduction

Background introduction-business characteristics and selection

TiDB application practice-business pain points and TiDB advantages

TiDB application practice-APM scenario application

Evolution based on TiKV-self-developed distributed KV

Evolution based on TiKV-distributed KV based on TiKV

Based on the evolution of TiKV-test conclusion

PingCAP

引用和评论

从企业数智化四阶段解读 TiDB 场景价值

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份

《SQL应用场景解析：如何通过SQL解决实际业务问题》

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统