Implementation of Seata in Ant International Banking Business

Text｜Li Qiao (flower name: South Bridge), Li Zongjie (flower name: Baiying)

Li Qiao: Senior development engineer of Ant Group, responsible for the development of payment and settlement system of Ant overseas bank

Li Zongjie: Ant Group technical expert, responsible for the research and development of Ant distributed transaction middleware

This article is 11580 words read 25 minutes

PART. 1--background

Ant International's overseas banking business is being partially migrated to Alibaba Cloud, and the SOFA technology stack used internally cannot be supported on Alibaba Cloud. In order to meet the goals of the rapid development of banking business and simplify the technology stack of the banking system, we have used a set of open source technology solutions such as Spring+Dubbo to rebuild a new technology stack. As a financial institution, Ant Group adopts a micro-service architecture for internal applications, and the consistency between data is extremely important. However, the original distributed transaction framework within Ant Group cannot provide technical support on Alibaba Cloud.

Seata is a distributed transaction solution that includes Alibaba Group's TXC (Alibaba Cloud version is called GTS) and Ant Group's TCC/SAGA. transactional framework. Therefore, after a comprehensive comparison of various existing distributed transaction frameworks, we chose Seata.

This article introduces the detailed scheme of using the open source Seata 1.4.2 version for distributed transaction management in the construction of the international site by the overseas bank technology department of Ant Group. At the same time, this article also introduces how to handle transaction suspension, idempotency, empty commit, and empty rollback on the client side.

PART. 2--Research

After four years of construction, Seata has formed a very large technical system. But no matter how it evolves, Seata maintains the stability of the architecture and the backward compatibility of the interface as a whole.

2.1--Seata Architecture

Seata's official website gives the following architecture diagram:

Overall, it consists of the following roles:

●TC: Transaction Coordinator

Transaction coordinator: maintains the state of global transactions and branch transactions, and drives global transaction commits or rollbacks.

●TM: Transaction Manager

Transaction Manager: Define the scope of the global transaction, commit or roll back the global transaction.

●RM: Resource Manager

Resource manager: In the same application as the branch transaction, it registers the branch transaction, reports the status of the branch transaction, and drives the commit or rollback of the branch transaction.

The netty framework is used for long-link communication between TC, TM and each RM. The communication protocol is a set of binary bidirectional communication protocols customized on top of the four-layer TCP protocol, so the overall communication efficiency of Seata is very high.

2.2--Transaction Mode

On top of this architecture, Seata provides four transaction modes: TCC, AT, SAGA and XA:

TCC mode

Participants need to implement the Prepare/Commit/Rollback interface, realize the preprocessing of data resources in the first stage, and implement the commit and rollback logic in the second stage to complete the two-stage submission. The advantage is to achieve data visibility and isolation through business logic, quickly release local transactions, and improve the concurrency of the same resource. The disadvantage is that the preprocessing process of intermediate data is introduced, which increases business complexity. Therefore, the TCC mode has good performance and isolation, and is especially suitable for concurrent transaction processing of the same account in banking and financial scenarios.

AT mode

In the first stage, by parsing the SQL, a two-stage rollback log is generated: when the two-stage commit is performed, the rollback log is deleted; when the two-stage rollback is performed, the record is restored to the state before the first stage through the rollback log. Similar to TCC's two-stage, but the two-stage Commit/Rollback is automatically generated by the framework, which automatically implements the two-stage operation, which is easier to use than TCC. However, AT mode uses global locks to achieve data isolation, so transactions for the same resource can only be operated serially, so the performance is inferior to TCC. Of course, if there is no scenario in which the same resource is used concurrently, the AT mode can take into account performance and isolation, as well as better development efficiency.

SAGA mode

It is a long transaction solution. Its first-stage forward service and second-stage compensation service are implemented by business development. It has a large number of applications in the field of microservice orchestration. However, its data isolation is not good, and business data is at risk of being written dirty.

XA mode

The interface used is consistent with the AT mode, which provides the strictest isolation and ensures that users will not have dirty reads. The disadvantage of XA mode is that its transaction scope is longer and its performance is lower.

2.3--Summary

The four modes of Seata are all technical implementations of two-phase transactions. Combined with Seata's own technical architecture, its transaction model generally has the following characteristics:

PART. 3--Practice

After conducting the overall technology research, we believe that Seata's overall technology stack can meet the needs of all our financial business scenarios. During the implementation process, we made trade-offs according to different technical requirements in different business scenarios. We used the AT mode in the transaction flow part of the initiator's local, and used the TCC mode in the downstream account-related services. Next in this chapter, we will use the most classic account balance model in the business as an example to describe this practice in detail, including the work done to adapt to the corresponding transaction model.

3.1--Use AT mode for transaction flow

In order to record the status of each transaction flow, we must record the flow of this transaction before initiating each transaction. The simplified model of the flow of transaction status is as follows:

When a request is received, the transaction flow of the initialization state is first recorded, and then a distributed transaction is initiated. After the distributed transaction of this transaction is successfully executed, it is expected that the transaction flow needs to be in a successful execution state. In order to retry the failed transaction, we started a scheduled task to regularly scan the transaction flow status, and the records that were not completed after a certain period of time need to be retryed.

As can be seen from the above scenario, the initiator also needs to participate in the entire distributed transaction in the state change of the transaction flow record of the initiator. If the distributed transaction is successful, the pipeline status needs to be updated to success, and if the distributed transaction fails, it needs to be updated to fail. At the same time, the transaction flow records are independent, and there is no possibility of concurrent operation. Only the scan of the asynchronous task will be locked with the transaction in the transaction.

For this scenario, it is very suitable to use the AT mode. It is not necessary to record the transaction execution status through the business code, but directly in the initiator through the proxy data source provided by the AT mode to modify the pipeline status to the "execution successful" status, which is automatically completed by the framework. The commit and rollback of the first and second phases in the AT mode ensures that the transaction flow state is consistent with the state of the overall distributed transaction, while maintaining concise code and higher development efficiency.

3.2--Account using TCC mode

From the perspective of business principles, the balance of the account cannot be increased or decreased out of thin air, and each balance change must have a detailed corresponding. Therefore, the account balance model should have the following key elements: account (whose account is clear), balance (the specific value of customer assets carried), and account details (record the value, time and reason of account balance changes) . These three actually constitute the simplest account balance model.

According to the analysis from the business principle, we have produced the simplest basic data model of account balance:

Account (ip_account)

The account model determines basic account information:

●Account: The account number is mapped with the member's information as the member's balance asset;

●Account type: public (merchant account) , private (personal account) , platform account, marketing account, merchant account to be settled, etc.;

●Balance: record the balance of the current customer asset specific data;

●Transaction status: Control four transaction methods: inflow, outflow, recharge, and withdrawal, which is a guarantee of fund security at the accounting layer. Each type uses T/F to indicate whether such transactions can be carried out, T is possible, F is not;

●Accounting subject: It is used to indicate that the current account is under the accounting subject for accounting.

Account details (ip_account_log, ip_trans_log)

ip_trans_log refers to the transaction details of the account, which is used to explain why this account is going to make this billing request. It can be understood as the original business certificate, which is used to verify the correlation with the business system data. Take you to buy a bottle of mineral water at a convenience store with Alipay as an example, it is the original information recorded that you spend 2 yuan to buy water.

ip_account_log refers to the accounting details, which is used to explain how much money has been deducted from this account, and who is the account of the other party. Take the example of buying a bottle of water for 2 yuan at a convenience store with Alipay. Here are two accounting details, one is to deduct 2 yuan from your Alipay account, and the other is to add 2 yuan to the account to be settled by the merchant of the convenience store.

Ant International Overseas Bank’s business on account operations has high financial attributes, and puts forward certain requirements for high availability and high performance of the system. The isolation requirements for concurrent requests are extremely high.

Therefore, in these four modes, we first ruled out the SAGA mode with poor isolation. In this business scenario of accounting operations, using the AT mode will lock the entire account, and the XA mode will also be due to long transactions. There are performance issues. To use the TCC mode, you only need to freeze the changed amount. Therefore, the TCC mode was chosen by us because of its outstanding advantages in performance and isolation. Let's introduce the business model conversion in the practice of using Seata for distributed transactions.

There are two roles in TCC mode: transaction coordinator and transaction participant. The interaction flow between them is as follows:

3.2.1--First stage analysis

The first stage is also called the Prepare stage, and its specific process is as follows:

1. First, the node where the transaction coordinator (initiator) is located will send a Prepare request to all participating nodes;

2. After each participant node receives the Prepare request, it performs data updates related to local transactions, including the storage of the main transaction ID and branch transaction ID, and transaction snapshot information records;

3. If the participant's local transaction is successfully executed, a "success" message is returned to the transaction coordination node, otherwise a "failure" message is returned;

4. After the transaction coordinator receives the return messages from all participants, the entire distributed transaction enters the second phase.

Essentially, the TCC Prepare phase is to reserve resources. The upgrade of our account model is also to support one-stage balance reservation.

3.2.2--Second stage analysis

The second stage of the distributed transaction is also called the Execute stage. The specific process is as follows:

1. If all transaction participants return a "success" message to the transaction coordinator node, it will send a Commit request to all participant nodes, otherwise it will send a Rollback request;

2. After each participant receives the Commit/Rollback request, each participant executes the local transaction commit, releases the lock resources, and returns the "execution complete" message to the transaction coordinator;

3. The transaction coordinator receives the "execution complete" feedback from all transaction participants, modifies the transaction status, and ends the process.

In our model, Phase 2 is actually an acknowledgement and cancellation of resources already reserved in Phase 1. For example, in the transfer process, it is necessary to confirm whether the balance is sufficient in the first stage. If the balance is sufficient, the reservation operation of freezing the amount to be transferred is performed. When the transfer is submitted in the second stage, that is, when the transfer is completed, the amount frozen in the first stage needs to be frozen. Do the real delete confirmation operation.

3.2.3--Balance model adapted to TCC

After the above distributed transaction explanation, for the execution process of distributed transactions, we will have another perspective on the account balance model at the technical level. "Locking and marking" the amount involved in the entire distributed transaction can be temporarily collectively referred to as "Transaction Amount".

In terms of business nature, there are only two actions of "income" and "expenditure" for an account. In order to achieve the purpose of reservation and confirmation of an intermediate data between the first stage and the second stage, we will record the account expenditure during the transaction process. The account amount is called the system amount, and the account income amount in the transaction is called the transaction underpayment amount.

The model is as follows:

System Amount:

It is also visible outside the transaction and is reflected in the account model ip_account. This design is also in line with business logic. If an account has multiple transactions at the same time, and there are multiple transaction processes, each transaction should perceive the currently occupied system amount - "amount spent in transaction", otherwise there is an overdraft risk;

Transaction underpayment:

It is only visible in the same transaction, because the transaction does not reach the amount until the entire transaction is completed, and it is uncertain whether the entire transaction can be successfully completed. If it is visible outside the transaction, it may be used by other transactions. Once a rollback occurs, it will be cause capital loss, resulting in capital risk.

3.2.4--Calculation of available balance

According to the above model and analysis, we can know that:

Available Balance = Account Balance - Frozen Amount - System Amount + Unreached Balance of the Transaction

3.2.5--System Amount Calculation

An example table is as follows (numbers circled in red are sidenotes) :

Explanation (contrast with the red circle in the table) :

1. The system amount records the unpaid amount, which can be regarded as a special freeze.

2. The unreceived amount is the amount that will be received but not received in a transaction, and can only be used in the same transaction.

3. The available balance in the transaction and the available balance outside the transaction refer to the transaction with the same main transaction number. The example in this table is multiple transactions in the same transaction.

4. The account balance of payment preprocessing does not decrease, but increases the system amount.

5. To increase the amount of the system, according to the calculation formula of the available balance, it is equivalent to the reduction of the available balance.

6. The balance of the account for collection preprocessing does not increase, but increases the unreached amount.

7. The unreached amount can only be used in the same transaction and cannot be seen outside the transaction.

8. Prioritize (reduce) the unmet amount in the same transaction.

9. When the unreached amount is not enough to pay, use (increase) the system amount.

10. Update the account balance according to the calculation formula when submitting.

To sum up, in a Seata distributed transaction, there can be both TCC mode participants and AT mode participants, and TM will not perceive which mode the participants use. Therefore, we use the TCC and AT modes respectively according to different scenarios, use the AT mode at the initiator to control the consistency of transaction flow records, and use the TCC mode among the accounting participants to ensure isolation and at the same time obtain higher performance.

Below is a description of our overall structure:

3.3--Detailed explanation of transaction process examples

Let's first assume a business scenario: this time you came to the supermarket to buy 100 yuan of fruit, and opened Alipay to pay at checkout. The balance of Alipay is only 20 yuan, and the remaining 80 yuan needs to be deducted from the bank card. Suppose this payment The process is to pay 20 yuan first, then recharge 80 yuan from the bank card to Alipay, and finally pay 80 yuan.

3.3.1 -- Initiator One Phase Process (AT)

Before the transaction transaction is started, asset exchange, as the initiator, first inserts the transaction flow record table, records the initial state as not executed, then starts the distributed transaction, and executes the serial update operation of the status flow through the AT mode in the initiator's local stage.

The transaction flow record table itself does not have the problem of concurrent reading and writing, so the AT mode updates the status to the transfer success in one stage. If the transfer fails, the status will be rolled back to the unexecuted state; if the transfer is successful, the second stage of the distributed transaction Commit releases the lock. During the execution of the transaction, the scheduled polling task will regularly detect the record, so in order to improve the isolation, use select for update for the read operation of the pipeline, and set the AT mode isolation level to the serialized isolation level, so that the scheduled task cannot take When the read lock is reached, the polling will be retried until the transaction times out and the lock is released. After the read status is not executed, the retry is initiated again.

3.3.2--Participant one-stage process and account balance model change (TCC)

Next, the first stage of the accounting process will be explained, as shown in the following figure:

Processing of distributed transaction accounting amounts:

Explain:

The second line : pay 20, reserve resources, freeze 20 of the balance, the system amount is 20 (0+20), the available balance in the transaction is 0, and the available balance outside the transaction is 0 (20-0-20+0).

The third line : recharge 80, the unreached amount increases during preprocessing, equal to 80 (0+80), the available balance in the transaction is 80 (20-0-20+80), and the available balance outside the transaction is 0 (20-0-20+ 0), the underachieved amount is not visible outside the transaction and is equal to 0.

The fourth line : pay 80, the resource reservation will be frozen first, and the unreached amount will be used first, so the system amount is 20 (20+80-80), the unreached amount is 0 (80-80), and the available balance outside the transaction is 0 ( 20-0-20+0).

3.3.3 -- Initiator Two-Stage Process (AT)

The second stage of the initiator is automatically submitted through the AT framework, and no business care is required. After the second stage is submitted, the transaction flow record will release the row lock.

3.3.4--Participant two-stage process and account balance model change (TCC)

The second-stage processing flow of the participant's accounting part is as follows:

According to the above process, combined with the process of one-stage bookkeeping, we get the following balance change process:

Explain:

When the fifth line commits the transaction, the application will extract the amount in all transactions. Since the unreached amount is preferentially deducted, the recharge and the second payment are exactly equal to each other. The app obtains an unreached amount with an amount of zero and a system amount detail with an amount of 20, and restores the system amount to 0. Then the application traverses the branch transaction, completes the calculation of the account balance of 20-20+80-80=0, persists it to the account model, and implements the accounting details.

3.4--AT+TCC programming practice

Seata's clients are divided into transaction initiators and transaction participants. When the initiator enables the AT mode of Seata, it needs to enable the automatic proxy of the data source through configuration, or manually specify the proxy source in the code. A participant's Prepare/Commit/Rollback should be performed in a local transaction, and its Commit/Rollback interface only needs to pay attention to the commit or rollback of the current application, without calling the Commit or Rollback interface of the lower-level participant.

3.4.1--Initiator access to AT mode

The AT mode is a two-phase protocol mode that is automatically hosted by the framework for the database. To use the AT mode in the initiator, you only need to proxy the data source of the application through the framework.

It should be noted that in AT mode, a table can only have one field as the primary key, and composite primary keys cannot be used.

3.4.2--Participant access to TCC mode

To access the TCC mode, it is necessary to design the original one stage into two stages in terms of business. Therefore, three interfaces need to be provided, namely the one-phase Prepare, the two-phase Commit and Rollback interfaces, and the one-phase interface needs to be marked with Seata's TCC two-phase annotation @TwoPhaseBusinessAction, as follows:

When using Seata, there are the following precautions:

1. If the participant times out or the TM has not submitted the transaction, the transaction timeout period can be set on the initiator side, the default is 60 s. If the TM has not initiated the submission after the set period, the TC will automatically initiate a transaction rollback;

2. Both TCC and AT are compensatory transactions. When submitting a local transaction, the distributed transaction is not completed, and the data consistency from the global perspective of the user cannot be guaranteed. It is necessary to prevent the scenario of dirty reading and dirty writing. In order to improve the isolation of the initiator at the first stage, the select statement needs to be added with for update to perform locking serial processing:

a. Dirty writing scenarios need to add @GlobalTransaction;

b. Dirty read scenarios need to add @GlobalLock + @Transactional or @GlobalTransaction;

3. The name in the TwoPhaseBusinessAction annotation is used as the resourceId of the RM, and resourceId and applicationName uniquely determine an RM, so the same name cannot exist in the same application. If a system implements multiple SPIs with the same name, Seata will not be able to distinguish between the two interfaces.

PART. 4--abnormal

In a distributed system, abnormal situations such as packet loss, request timeout, request retry, program suspended animation or crash, and operating environment crash often occur in the network. Of course, Seata also faces these problems.

In the first stage, if a transaction participant reports a failure message, it means that the local transaction execution of the node was unsuccessful and must be rolled back. In the second stage, the transaction coordinator node sends a rollback request to all transaction participants. After receiving the rollback request, each transaction participant node needs to perform the transaction rollback operation locally. Other scenarios include abnormal failures and retry requests caused by network timeout jitter. These exceptions also need to be resolved in the process of distributed transaction development.

4.1--Four kinds of exceptions

As far as distributed transaction scenarios are concerned, there are the following common problems:

1. Idempotent processing

2. Empty rollback

3. Empty commits

4. Resource hanging

These abnormal situations require the framework and operation and maintenance to work together to ensure the quality of service.

4.2--Solution

In 2021, when we used Seata v1.4.2 , it did not offer anti-suspension. The following countermeasures were taken for the four exceptions:

Idempotent control:

In the first phase of the transaction, xid+branchId is used as the unique key for idempotent control. If an idempotent conflict is found, an exception will be thrown directly to roll back the transaction. If a retry is required, the business system initiates it proactively.

Allow empty rollbacks:

The first-stage request was not received due to network reasons, or the second-stage Rollback request was received first. Empty rollback is a normal situation, and the system needs to be able to receive and process empty rollback requests normally.

Reject empty commits:

The system will receive the second-phase Commit request only when the first-phase request is successfully processed. If the second-stage request is received without receiving the first-stage request, then the system must be abnormal and should be rejected directly.

Anti-suspension:

In the case of network jitter, etc., after the system receives the second-stage Rollback request, it receives the first-stage request and processes it, then the system will never receive the second-stage Commit or Rollback, and the transaction will not reach the final state, that is in suspension. The system needs to record the rollback record when processing an empty submission, and reject it directly when it receives a one-stage request later.

 CREATE TABLE IF NOT EXISTS `tcc_fence_log`
(
    `xid`           VARCHAR(128)  NOT NULL COMMENT 'global id',    
    `branch_id`     BIGINT        NOT NULL COMMENT 'branch id',    
    `action_name`   VARCHAR(64)   NOT NULL COMMENT 'action name',    
    `status`        TINYINT       NOT NULL COMMENT 'status(tried:1;committed:2;rollbacked:3;suspended:4)',    
    `gmt_create`    DATETIME(3)   NOT NULL COMMENT 'create time',    
    `gmt_modified`  DATETIME(3)   NOT NULL COMMENT 'update time',    
    PRIMARY KEY (`xid`, `branch_id`),    
    KEY `idx_gmt_modified` (`gmt_modified`),    
    KEY `idx_status` (`status`)
) ENGINE = InnoDB
DEFAULT CHARSET = utf8mb4;

Solving these types of exception handling mainly relies on each participant's local anti-suspension record table , and the core logic is to insert a transaction record into this table in one stage.

Check whether the record exists in the second stage. If it exists, it means that the first stage has been executed normally. If it does not exist, it means that it is an empty rollback. At this time, you need to insert a record immediately to indicate that an empty rollback has occurred, so that if the first stage is received again, it will be Due to the primary key conflict error and refusal to execute, it can prevent the suspension problem caused by the two-stage rollback being executed before the first-stage rollback. This method is also referred to as a dual insertion scheme.

4.3--The latest solution

In the Seata v1.5.1 version ( https://github.com/seata/seata/releases/tag/v1.5.1 ) released in June 2022, solutions have been provided for the above four abnormal problems, and the use of Very convenient, just set the attribute useTCCFence=true in the annotation @TwoPhaseBusinessAction.

PART. 5--Pressure measurement

In order to provide stable and reliable services online, we conducted a stress test to determine Seata's carrying capacity.

The pressure measurement method draws on the official Seata solution

( https://github.com/seata/seata/pull/2611 ). In a similar stress testing environment, we replaced the service with our own business scenario and made the following changes:

●There are 2 initiators, 3 downstream participants

●The request TPS is set to 4 times the normal business peak: 50*4 = 200 tps

●Start the ladder pressure test with a concurrent step size of 100, and each ladder is subjected to 4 times of pressure measurement

The result is as follows:

Note: The request time is the time required for a complete 2B service request, and the normal overall link time is about 400 ms;

During the entire stress test, there was no pressure on the memory and disk. When the stress test reached 500 concurrent users, the CPU pressure began to rise, and a timeout occurred. Based on the above stress test results, we have formulated an online service guarantee plan: when the server receives more than 400 requests or the service RT takes more than 400 ms, the service will be automatically expanded.

PART. 6--Monitoring

The observability methods of distributed systems are nothing more than Logging, Metrics and Tracing. Seata itself supports access to Prometheus to collect its Metrics, and access to Skywalking to collect its Tracing. Since our system is deployed on Alibaba Cloud, we built Seata's overall monitoring and alarm system based on Alibaba Cloud's APM technology system.

●Logging splits logs and connects to Alibaba Cloud SLS

●Metrics Alibaba Cloud Prometheus

●Monitor Alibaba Cloud ARMS and SLS dashboard

●Alert SLS keyword alerts, SLS dashboard alerts, and Prometheus Metrics alerts

As for the tracing capability, we have enhanced the related capabilities of Seata. By connecting the TraceId between the client and the server, the entire tracing link is opened up.

The overall monitoring system architecture is as follows:

At the same time, we have built a transaction management terminal by ourselves to alert the exceptions exceeding the threshold by monitoring the following content:

●Number of online node downtime

● TM/RM abnormal transaction status registered to TC

●Number of timeout requests

●Number of error logs over a period of time

 select count(*) as error_log_num where level = 'ERROR'

● Disconnected or re-established transactional connections

 -- 过去X时间段内重建连接数量
select count(*) where context like '% to server channel inactive.'
-- 过去X时间段内断开连接数量
select count(*) where context like 'remove unused channel:%'

●Timeout heartbeat

 SELECT cost_time,method,logtime WHERE method = 'heartBeat' and cost_time >= 60

●Number of bugs over time

 select count(*) as error_log_num where level = 'ERROR'

Transactions and branch transactions when two-stage exceptions are suspended

 -- 获取二阶段 Commit 时挂起的事务以及分支
select regexp_extract(context, '(?<=[)(\S+)(?=])') as xid, regexp_extract(context, '(?<=[)(\d+)(?=])') as branch_id where context like  'Committing global transaction[%' 
-- 获取二阶段 Commit 时挂起事务的 xid， 并且去重
select DISTINCT regexp_extract(context, '(?<=[)(\S+)(?=])') as xid   where context like  'Committing global transaction[%'
-- 获取二阶段 Commit 时挂起的事务分支 branch_id， 并去重
select DISTINCT regexp_extract(context, '(?<=[)(\d+)(?=])') as branch_id  where context like  'Committing global transaction[%'  
-- 获取二阶段 Commit 时挂起的事务分支 xid、branch_id
select DISTINCT  regexp_extract(context, '(?<=[)(\S+)(?=])') as xid, regexp_extract(context, '(?<=[)(\d+)(?=])') as branch_id where context like  'Committing global transaction[%' 
-- 获取二阶段 Commit 时挂起的事务分支条目数量
select count(DISTINCT regexp_extract(context, '(?<=[)(\S+)(?=])') ) as commit_error_times where context like  'Committing global transaction[%'

Transactions and branch transactions during two-stage Rollback

 -- 获取二阶段 Rollback 时挂起的事务分支日志的条目数量
select count(*) where context like 'Rollback branch transaction fail and will retry, xid =%' 
-- 获取二阶段 Rollback 时挂起的事务分支日志 
select * where context like 'Rollback branch transaction fail and will retry, xid =%' 
-- 获取二阶段 Rollback 时挂起的事务的 xid 和 branch_id
select regexp_extract(context, '(?<=xid\ =\ )(\S+)(?=\ )') as xid, regexp_extract(context, '(?<=branchId\ =\ )(\S+)(?=)') as branch_id where context like 'Rollback branch transaction fail and will retry, xid =%' 
-- 获取二阶段 Rollback 时挂起的事务的 xid 和 branch_id，并去重
select DISTINCT regexp_extract(context, '(?<=xid\ =\ )(\S+)(?=\ )') as xid, regexp_extract(context, '(?<=branchId\ =\ )(\S+)(?=)') as branch_id where context like 'Rollback branch transaction fail and will retry, xid =%' 
-- 获取二阶段 Rollback 时挂起的事务的数量
select count(DISTINCT regexp_extract(context, '(?<=xid\ =\ )(\S+)(?=\ )') ) where context like 'Rollback branch transaction fail and will retry, xid =%'

Note:

1. The monitoring object is the log of Seata v1.4.2, different Seata versions may have similarities and differences;

2. The SQL for collecting logs is SLS SQL.

PART. 7--Prospect

This article introduces in detail some key technical steps and usage details of the process of accessing Seata v1.4.2 from Ant Group's international sites, and provides a comprehensive solution to some of the existing problems. Recently, Seata has released v1.5.2, which includes new features such as consoles that make us very excited. We plan to upgrade to the new version in time in the second half of the year and carry out related stability construction to make full use of the latest technology dividends of the open source version.

As a firm supporter of open source technology, Ant Group is not only a large-scale user of Seata, but also a co-builder of Seata's open source. It has successively submitted TCC and SAGA model implementations for Seata. In the process of actively promoting the construction of Seata 2.0, we will continue to contribute technologies such as SAGA annotation mode, two-phase execution asynchrony, two-phase parallel submission, and RocketMQ-based distributed message transactions.

Open source technology, benefit the public. We will maintain the linkage between open source and internal work, share a Seata open source core, and maintain the healthy and sustainable development of the project and community.

understand more…

Seata Star ✨:
https://seata.io/en-us/