Author: Chang Feng
introduction
CDC (Change Data Capture) refers to an application scenario that monitors upstream data changes and synchronizes the change information to downstream services for further processing. In recent years, the popularity of event-driven architecture (EDA) has gradually increased, and it has become the first choice of project architects. EDA is naturally compatible with the underlying infrastructure of CDC, which treats data changes as events, and each service completes a series of business drivers by monitoring events of interest. Alibaba Cloud EventBridge is a serverless event bus service launched by Alibaba Cloud, which can help users easily and quickly build applications based on EDA architecture. Recently, the EventBridge event stream has supported the CDC capability based on the Alibaba Cloud DTS [ 1] service. This article will introduce how to use EventBridge to easily build a CDC application from the aspects of CDC, the application of CDC on EventBridge, and several best practice scenarios.
CDC overview
Basic principles and application scenarios
CDC captures incremental data and data schema changes from the source database, and synchronizes these changes to the target database, data lake or other data analysis services in an orderly manner with highly reliable, low-latency data transmission. At present, the mainstream open source CDC tools in the industry include Debezium [2 ] , Canal [3 ] and Maxwell [4 ] .
Image source: https://dbconvert.com
At present, there are mainly the following types of CDC implementations in the industry:
1. Based on timestamp or version number
The timestamp-based method requires that the database table has a field representing the update timestamp. When data is inserted or updated, the corresponding timestamp field will be updated accordingly. The CDC component periodically retrieves data records with an update time greater than the last synchronization time to capture changes to data during the current period. The principles of version number-based tracking and timestamp-based tracking are basically the same, requiring developers to update the version number of the data when changing data.
2. Based on snapshots
The snapshot-based CDC implementation uses three copies of the data source at the storage level, namely the original data, the previous snapshot and the current snapshot. Compare the differences between the 2 snapshots to get the data changes between them.
3. Trigger-based
Trigger-based implementation of CDC is actually to create a trigger on the source table to store records of data change operations (INSERT, UPDATE, DELETE). For example, a table is specially created to record the user's change operations, and then three types of triggers, INSERT, UPDATE, and DELETE, are created to synchronize user changes to this table.
4. Based on logs
The above three methods are all invasive to the source database, and the log-based method is a non-invasive CDC method. Databases use transaction logs to achieve disaster recovery. For example, MySQL's binlog records all changes made by users to the database. Log-based CDC obtains real-time database changes by continuously listening to the transaction log.
CDC has a wide range of application scenarios, including but not limited to these aspects: database synchronization in remote computer rooms, heterogeneous database data synchronization, microservice decoupling, cache update and CQRS, etc.
Alibaba Cloud-based CDC solution: DTS
Data Transmission Service (DTS) is a real-time data streaming service provided by Alibaba Cloud, which supports data interaction between data sources such as relational database (RDBMS), non-relational database (NoSQL), and data multidimensional analysis (OLAP). It integrates data synchronization, migration, subscription, integration and processing. Among them, DTS data subscription [ 5 ] function can help users to obtain real-time incremental data of self-built MySQL, RDS MySQL, Oracle and other databases.
Application of CDC on EventBrige
Alibaba Cloud EventBridge provides event routing services for two different application scenarios, event bus [6 ] and event stream [7 ] .
The bottom layer of the event bus has the persistence capability of events, which can route events to multiple event targets as needed.
The event stream is suitable for end-to-end streaming data processing scenarios. It extracts, transforms, and analyzes events generated at the source end in real time and loads them to the target end without creating an event bus. End-to-end dumping is more efficient and easier to use.
In order to better support users' needs in CDC scenarios, EventBridge supports the data subscription function of Alibaba Cloud DTS on the event stream source side. Users can synchronize database change information to the EventBridge event stream with simple configuration.
EventBridge customizes the DTS Source Connector based on the DTS sdk. When the user configures the event stream whose event provider is DTS, the source connector will pull the DTS record data from the DTS server in real time. After the data is pulled to the local, a certain structure will be encapsulated, the data such as id, operationType, topicPartition, beforeImage, afterImage, etc. will be retained, and some system properties required by the streaming event will be added.
For DTS Event samples, please refer to the official documentation of EventBridge
EventBridge Streaming guarantees the sequence of DTS events, but there is the possibility of repeated event delivery. EventId guarantees a one-to-one mapping relationship with each DTS record, and users can perform idempotent processing of events based on this field.
Create an EventBridge event stream with a DTS source
The following shows how to create a DTS-sourced event stream in the EventBridge console
- Preliminary preparation
- Open the EventBridge service;
- Create a DTS data subscription task;
- Create a consumer group account information for consuming subscription data.
- Create an event stream
- Log in to the EventBridge console, click the left navigation bar, select "Event Stream", and click "Create Event Stream" on the event stream list page;
- "Event Stream Name" and "Description" in "Basic Information" can be filled in as needed;
- When creating an event stream and selecting an event provider, select "Database DTS" from the drop-down box;
- Select the created DTS data subscription task in the "Data Subscription Task" column. In the consumption group column, select which consumption group to use to consume subscription data, and fill in the consumption group password and initial consumption time.
- Fill in the event flow rules and targets as required, save and start to create an event flow with DTS data subscription as the event source.
Precautions
The following points need to be paid attention to when using:
- EventBridge uses the SUBSCRIBE consumption mode [8 ] , so please ensure that there are no other client instances running in the current DTS consumption group. If the set consumer group has been running before, the incoming site will be invalid, and will continue to consume based on the site that this consumer group consumed last time;
- The site passed in when creating a DTS event source only takes effect when the new consumer group runs for the first time, and subsequent tasks will continue to consume based on the last consumption site after restarting;
- The EventBridge event stream subscribes to DTS data whose OperationType is INSERT, DELETE, UPDATE, and DDL;
- When using DTS event source, there may be repeated messages, that is, it is guaranteed that the message will not be lost, but it cannot be guaranteed to be delivered only once. It is recommended that users do idempotent processing;
5. If users need to ensure sequential consumption, they need to set the exception tolerance policy to "NONE", that is, exception tolerance is not tolerated. In this case, if the destination end of the event stream consumes messages abnormally, the entire event stream will be suspended until the destination end returns to normal.
best practice example
Implement CQRS based on EventBridge
In the CQRS (Command Query Responsibility Segregation) model, the command model is used to perform write and update operations, and the query model is used to support efficient read operations. There are certain differences in the data models used by read operations and write operations, and a certain method needs to be used to ensure data synchronization. CDC based on EventBridge event streams can meet such requirements.
Based on cloud services, users can easily build CQRS based on EventBridge in the following ways:
- Command the model to operate the database to make changes, and query the model to read elasticsearch to obtain data;
- Start the DTS data subscription task to capture DB changes;
3. Configure the EventBridge event stream, the event provider is DTS data subscription task, and the event receiver is Function Compute FC;
- The service in FC is the update elasticsearch data operation.
Microservice decoupling
CDC can also be used for microservice decoupling. For example, the following is an order processing system of an e-commerce platform. When a new unpaid order is generated, the database will have an INSERT operation, and when the status of an order changes from "unpaid" to "paid", the database will There is an UPDATE operation. Depending on the order status changes, there are different microservices on the backend to handle this.
- When the user places an order/payment, the order system processes the business and writes the data changes to the DB;
- Create a new DTS subscription task to capture DB data changes;
- Build the EventBridge event stream. The event provider is DTS data subscription task, and the event receiver is RocketMQ;
- When consuming RocketMQ data, three groups are enabled under the same topic to represent different business consumption logic;
a. GroupA updates the user cache of the captured DB changes, which is convenient for users to query the order status;
b. The downstream associated financial system of GroupB only processes new orders, that is, processes events whose DB operation type is INSERT, and discards other types of events;
c. GroupC only cares about the event that the order status changes from "unpaid" to "paid", and when a qualifying event arrives, it calls the downstream logistics and warehousing system to further process the order.
If the interface calling method is adopted, after the user places an order, the order system will need to call the cache update interface, the new order interface and the order payment interface respectively, and the business coupling is too high. In addition, this mode makes the data consumer not need to worry about the semantic information of the content returned by the upstream order processing interface. Under the condition that the storage model remains unchanged, it can directly judge from the data level whether the data change needs to be processed and what kind of processing is required. . At the same time, the natural message accumulation capability of the message queue can also help users realize business peaks and valleys when order peaks arrive.
In fact, the current message products supported by EventBridge Streaming also include RabbitMQ, Kafka, MNS, etc. In actual operation, users can choose according to their own needs.
Database Backup & Heterogeneous Database Synchronization
Database disaster recovery and heterogeneous database data synchronization are also important application scenarios of CDC. Using Alibaba Cloud EventBridge can also quickly build such applications.
- Create a new DTS data subscription task to capture user MySQL database changes;
- Build an EventBridge event stream, and the event provider is the DTS data subscription task;
- Use EventBridge to execute the specified sql in the destination database to realize database backup;
- Data change events are delivered to Function Compute, and user services update the corresponding heterogeneous databases according to the data changes.
Self-built SQL auditing
For users who have self-built SQL auditing requirements, EventBridge can also be used easily.
- Create a new DTS data subscription task to capture database changes;
- Build an EventBridge event stream, the event provider is DTS, and the event receiver is the log service SLS;
- When users need to audit SQL, they can query SLS.
Summarize
This article introduces some concepts of CDC, the application of CDC on EventBridge, and several best practice scenarios. With the continuous increase of supporting products, the ecological territory carried by EventBridge is also expanding. From message ecology to database ecology, from log ecology to big data ecology, EventBridge continues to expand its applicable fields and consolidate its status as an event hub on the cloud. It will continue to develop in this direction, with deeper technology and wider ecology.
Reference link:
[1] DTS:
https://www.aliyun.com/product/dts
[2] Debezium:
[3] Canal:
https://github.com/alibaba/canal
[4] Maxwell:
https://github.com/zendesk/maxwell
[5] DTS data subscription:
https://help.aliyun.com/document_detail/145716.html
[6] Event bus:
https://help.aliyun.com/document_detail/163897.html
[7] Event flow:
https://help.aliyun.com/document_detail/329940.html
[8] SUBSCRIBE consumption mode:
https://help.aliyun.com/document_detail/223371.html
Interested friends can scan the QR code below to join the DingTalk group discussion (group number: 44552972)
Click here to enter the EventBridge official website for more information~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。