Business Challenges In international business, due to the large number of markets, complex and diverse products and businesses, many delivery channels, and high drainage costs, it is necessary to make more refined management and optimization of businesses and products to meet market launch and operational needs. Reduce overall costs, improve operational efficiency and conversion rates. To this end, Ctrip has specially developed an international business dynamic real-time tagging processing platform (hereinafter referred to as CDP).

The data of Ctrip has a wide range of sources, various forms, and both offline data processing and online data processing. The processed data needs to be immediately applied to the business system, EMD, PUSH and other usage scenarios, which puts forward higher requirements on the timeliness, accuracy, stability and flexibility of the data processing system.

In order to solve the above problems, the CDP system must improve the data processing capability. In the past, the traditional solution was to perform T+1 calculations through data warehouses, and then import them into ES cluster storage. The front-end assembles ES query conditions to query eligible data by passing in query conditions. Ctrip has launched hundreds of tags, and more than 50% of them are used for queries. Since this solution is offline computing, the data timeliness is poor, and it relies on the underlying offline platform for computing and ES indexing, and the query response speed is slow.

solution
CDP hopes to improve the timeliness of data processing in the process of data processing, and at the same time meet the requirements of business flexibility. For data processing logic and data update logic, message data (Kafka or QMQ) can be dynamically consumed by the system dynamic configuration rules. To update the label, the business layer only needs to care about data filtering logic and conditional query. According to business requirements, business data label filtering is mainly divided into two scenarios:

Trigger scenes in real time. According to business needs, configure dynamic rules, subscribe to the change messages of the business system in real time, filter out data that meets the conditions of dynamic rules, and push them to downstream business parties through messages;

Tag persistence scenarios. The real-time business change messages of the business system are processed into business-related feature data according to business needs, and persistently stored in the storage engine. Businesses assemble query conditions and query engine data according to their needs, mainly including OLAP (analytical) and OLTP (online query) two types of queries.

Based on the above requirements, CDP streaming data adopts Kappa-like architecture, and label persistence adopts Lambda-like architecture, as shown in the following figure:

b5a321228d814f14bfd0be8fd651f15.jpg

Among them, the label persistence scenario needs to solve the persistent storage, update, and query services of business labels. Ctrip uses TiDB to store business persistent labels, and uses dynamic rule configuration in real-time trigger scenarios to consume business system data change messages. To ensure the timeliness of business persistent tags, and to meet the needs of accessing business feature data in different business scenarios through TiDB's support for OLTP and OLAP query features in different scenarios.

The system draws on the idea of Lambda data processing architecture. New data is sent to different channels according to different sources. The full amount of historical data is converted by a data batch engine (such as Spark) and written to the data persistence storage engine TiDB in batches. . Incremental data business applications are sent to Kafka or QMQ message queues in the form of messages, the data is processed according to the logical rules of tag persistence, and incrementally written to the persistent storage engine TiDB to solve the problem of data timeliness.

TiDB also has two persistent storage methods, one is row storage TiKV, which can support OLTP scenarios, and the other is column storage TiFlash, which can support OLAP scenarios. TiDB data storage automatically solves the data synchronization problem of these two engines, and the client query can choose the query method according to its own needs. At the same time, TiDB can also ensure that the two methods have good isolation, and take into account the strong data consistency, which excellently solves the isolation and column-store synchronization problems in HTAP scenarios.

At present, CDP has deeply integrated with Ctrip's various business systems to provide data and service support for the business feature tag library for international business growth.

Value
HTAP mixed load: It perfectly supports OLTP + OLAP mixed load, simplifies the IT system architecture, and greatly improves the real-time query performance of the business.

Horizontal elastic expansion: Get rid of the problem of MySQL sub-database and sub-table, and help Ctrip to perform horizontal elastic expansion at any time according to business growth.


PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...