149

With the rapid development of business and the increasing complexity of business, almost every company's system will move from monolithic to distributed, especially to microservice architecture. Then, it is inevitable to encounter the problem of distributed transactions.

This article first introduces the relevant basic theories, then summarizes the most classic transaction schemes, and finally gives a solution to the out-of-order execution of sub-transactions (idempotent, null compensation, and suspension problems), and shares them with you.

basic theory

Before explaining the specific scheme, let's first understand the basic theoretical knowledge involved in distributed transactions.

Let's take the transfer as an example, A needs to transfer 100 yuan to B, then the balance of A needs to be -100 yuan, and the balance to B is +100 yuan. The entire transfer must be guaranteed, A-100 and B+100 succeed at the same time, or fail at the same time . See how this problem is solved in various scenarios.

affairs

The ability to operate multiple statements as a whole is called a database transaction. A database transaction ensures that all operations within the scope of the transaction can either succeed or fail.

Transactions have 4 properties: atomicity, consistency, isolation, and durability. These four properties are often referred to as ACID properties.

  • Atomicity: All operations in a transaction are either completed or not completed, and will not end at a certain link in the middle. If an error occurs during the execution of a transaction, it will be restored to the state it was in before the transaction started, as if the transaction had never been executed.
  • Consistency: The integrity of the database has not been compromised before, during, or after a transaction. Integrity including foreign key constraints, application-defined constraints, etc. will not be violated.
  • Isolation: The ability of a database to allow multiple concurrent transactions to read, write and modify its data at the same time. Isolation can prevent data inconsistency due to cross execution when multiple transactions are executed concurrently.
  • Durability: After the transaction is completed, the modification of the data is permanent, even if the system fails, it will not be lost.

If our business system is not complicated, we can modify the data in a database and a service to complete the transfer, then we can use the database transaction to ensure the correct completion of the transfer business.

Distributed transaction

The inter-bank transfer business is a typical distributed transaction scenario. Assuming that A needs to transfer money across banks to B, data from two banks is involved. The ACID of the transfer cannot be guaranteed through a local transaction of a database, but can only be solved through distributed transactions.

Distributed transaction means that the initiator, resource and resource manager and transaction coordinator of the transaction are located on different nodes of the distributed system. In the above transfer business, the operation of user A-100 and the operation of user B+100 are not located on the same node. In essence, distributed transactions are to ensure the correct execution of data operations in distributed scenarios.

ACID

Distributed transactions partially follow the ACID specification:

  • Atomicity: Strictly Follow
  • Consistency: The consistency after the transaction is strictly followed; the consistency in the transaction can be relaxed appropriately
  • Isolation: unaffected between parallel transactions; visibility of intermediate results of transactions allows security relaxation
  • Persistence: Strictly Follow

Because the transaction process is not consistent, but the transaction will eventually complete and eventually reach consistency, so we call distributed transactions "eventually consistent"

Doubts about eventual consistency

It is especially emphasized that the eventual consistency here is different from the eventual consistency of CAP (C: consistency, A: availability, P: partition tolerance). Most of the current books and materials confuse the two. Below we will Focus on consistent interpretation.

The C of CAP refers to consistency when reading data from multiple replicas in a distributed system. Simply put, if I update a piece of data from v1 to v2, and then read arbitrary data:

  • Strong consistency: You can ensure that v2 is read every time, then it is strong consistency
  • Weak consistency: it may read v1 or v2, then it is weak consistency
  • Final consistency: After a certain period of time, to ensure that each read can read v2, then it is eventually consistent

The CAP theory proposes that a distributed system cannot satisfy 3 characteristics at the same time, and can only choose 2 at most. In the face of such a problem, there is a classic scheme called BASE theory, which pursues AP and then relaxes the requirements for C. AWS's Dynamo is one such system, which provides eventually consistent reads. For details, see Dynamo Consistent Reads

In recent years, the distributed theory has been further developed, and many systems do not follow the BASE scheme, but the CP+HA (Highly-Available) scheme. Distributed consensus protocols such as Paxos and Raft fully satisfy CP. In terms of A-availability, although not 100% available, combined with hardware stability upgrades in recent years, high availability can be achieved. The public data of Google distributed lock Chubby shows that the cluster can provide an average availability of 99.99958%, and the operation is interrupted for 130s a year, which can already meet very strict application requirements. The current SQL database software is based on CP+HA, but HA will be lower than Google's extreme data, but generally it can reach 4 9s

CP+HA means not BASE, which means that as long as the write is successful, the next read can read the latest result. Developers don't have to worry that the read data is not the latest data. In the multi-copy read and write, the same as Standalone is the same.

Because distributed transaction research mainly solves data consistency involving multiple databases, the actual data storage is mainly in the database, so it is also CP+HA. Therefore, distributed transactions satisfy C of CAP, but not C of ACID, also known as eventual consistency

Classic solution for distributed transactions

Due to the distributed transaction scheme, complete ACID guarantee cannot be achieved, and there is no perfect scheme that can solve all business problems. Therefore, in practical applications, the most suitable distributed transaction scheme will be selected according to the different characteristics of the business.

1. Two-phase commit/XA

XA is a specification of distributed transactions proposed by the X/Open organization. The XA specification mainly defines the interface between the (global) transaction manager (TM) and the (local) resource manager (RM). Local databases such as mysql play the role of RM in XA

XA is divided into two stages:

The first stage (prepare): that is, all participants RM are ready to execute the transaction and lock the required resources. When the participant is ready, report to the TM that it is ready.
The second stage (commit/rollback): When the transaction manager (TM) confirms that all participants (RM) are ready, it sends a commit command to all participants.
At present, the mainstream databases basically support XA transactions, including mysql, oracle, sqlserver, postgre

An XA transaction consists of one or more resource managers (RM), a transaction manager (TM), and an application program (ApplicationProgram).

The three roles of RM, TM, and AP here are classic role divisions, which will run through subsequent transaction modes such as Saga and Tcc.

Taking the above transfer as an example, the sequence diagram of a successfully completed XA transaction is as follows:
image.png

If any one participant fails to prepare, the TM will notify all participants who have completed the prepare to roll back.

The characteristics of XA transactions are:

  • Simple and easy to understand, easier to develop
  • The resource is locked for a long time, and the concurrency is low

If readers want to further study XA, go language and PHP, Python, Java, C#, Node, etc. can refer to DTM

2. SAGA

Saga is a solution mentioned in this database paper sagas. The core idea is to split a long transaction into multiple local short transactions, which are coordinated by the Saga transaction coordinator. If it ends normally, it will be completed normally. If a step fails, the compensation operation will be called once in reverse order.

Taking the above transfer as an example, the sequence diagram of a successfully completed SAGA transaction is as follows:

image.png

Once Saga reaches the Cancel stage, Cancel is not allowed to fail in business logic. If no success is returned due to network or other temporary failures, the TM will keep retrying until Cancel returns a success.

Features of Saga Transactions:

  • High degree of concurrency, no long-term locking of resources like XA transactions
  • Need to define normal operation and compensation operation, the development volume is larger than XA
  • The consistency is weak. For transfers, it may happen that user A has already deducted the money, and the final transfer fails.

There are many SAGA contents in the paper, including two recovery strategies, including concurrent execution of branch transactions. Our discussion here only includes the simplest SAGA.

SAGA is suitable for many scenarios, suitable for long transactions, and suitable for business scenarios that are not sensitive to intermediate results

If readers want to further study SAGA, please refer to DTM , which includes examples of SAGA success and failure rollback, as well as the handling of various network exceptions.

3. TCC

The concept of TCC (Try-Confirm-Cancel) was first proposed by Pat Helland in a paper titled "Life beyond Distributed Transactions: an Apostate's Opinion" published in 2007.

TCC is divided into 3 stages

  • Try phase: try to execute, complete all business checks (consistency), reserve necessary business resources (quasi-isolation)
  • Confirm stage: Confirm that the actual execution of the business is performed, without any business checks, only use the business resources reserved in the Try stage. The Confirm operation requires an idempotent design, and it needs to be retried after the Confirm fails.
  • Cancel stage: Cancel the execution and release the business resources reserved in the Try stage. The exceptions in the Cancel phase are basically the same as the exception handling solutions in the Confirm phase, which require an idempotent design.

Taking the above transfer as an example, the amount is usually frozen in Try, but not deducted. The amount is deducted in Confirm, and the amount is unfrozen in Cancel. The sequence diagram of a successfully completed TCC transaction is as follows:

image.png

The Confirm/Cancel phase of TCC is not allowed to return failure due to network or other temporary failures, and TM will continue to retry until Confirm/Cancel returns successfully.

TCC features are as follows:

  • High concurrency and no long-term resource locking.
  • The amount of development is large, and the Try/Confirm/Cancel interface needs to be provided.
  • The consistency is good, and it will not happen that SAGA has deducted the money and finally the transfer fails.
  • TCC is suitable for order-type business, business that has constraints on the intermediate state

If readers want to study TCC further, they can refer to DTM

4. Local message table

The local message table scheme was originally published in ACM in 2008 by ebay architect Dan Pritchett. The core of the design is to ensure the execution of tasks that require distributed processing asynchronously through messages.

The general process is as follows:

image.png
Writing local messages and business operations are placed in a transaction to ensure the atomicity of business and message sending, either all of them succeed or all of them fail.

Fault tolerance mechanism:

  • When the transaction to deduct the balance fails, the transaction is rolled back directly without subsequent steps
  • If the round-sequence production message fails, and the transaction to increase the balance fails, it will be retried.

Features of the local message table:

  • Rollback is not supported
  • Polling production messages is difficult to achieve. If you poll regularly, the total transaction time will be extended. If you subscribe to binlog, it will be difficult to develop and maintain

It is suitable for the business that can be executed asynchronously and the subsequent operations do not need to be rolled back

Five, business news

In the above local message table solution, the producer needs to create additional message tables, and also needs to poll the local message table, and the business burden is heavy. Alibaba's open-sourced RocketMQ versions after 4.3 officially support transaction messages, which essentially put the local message table on RocketMQ to solve the problem of atomicity between production-side message sending and local transaction execution.

Transaction message sending and submission:

  • Send a message (half message)
  • The server stores the message and responds to the writing result of the message
  • Execute the local transaction according to the sending result (if the writing fails, the half message is not visible to the business at this time, and the local logic is not executed)
  • Execute Commit or Rollback according to the local transaction status (Commit operation publishes messages, messages are visible to consumers)

The flow chart of normal sending is as follows:

image.png

Compensation process:

For transaction messages without Commit/Rollback (messages in pending state), initiate a "checkback" from the server
The Producer receives the checkback message and returns the status of the local transaction corresponding to the message, which is Commit or Rollback
The transaction message scheme is very similar to the local message table mechanism, the main difference is that the original related local table operations are replaced by a reverse lookup interface

Transaction message characteristics are as follows:

  • Long transactions only need to be split into multiple tasks, and a reverse check interface is provided, which is easy to use
  • There is no good solution for the review of transaction messages, and data errors may occur in extreme cases

It is suitable for the business that can be executed asynchronously and the subsequent operations do not need to be rolled back

If readers want to study transaction messages further, they can refer to DTM , or Rocketmq

6. Best efforts notice

The initiating notifying party makes its best efforts to notify the receiving party of the business processing result through a certain mechanism. Specifically include:

There is a certain message repetition notification mechanism. Because the recipient of the notification may not receive the notification, there must be a certain mechanism to repeat the notification of the message at this time.
Message proofreading mechanism. If the receiver does not notify the receiver despite its best efforts, or the receiver needs to consume the message again after consuming the message, the receiver can actively query the message information from the notifying party to meet the demand.
The local message table and transaction messages described earlier are reliable messages. How is it different from the best-effort notification described here?

Reliable message consistency. The initiating notifier needs to ensure that the message is sent out and the message is sent to the receiving notifying party. The reliability of the message is guaranteed by the initiating notifying party.

Best-effort notification, the initiating party will do its best to notify the receiving party of the business processing result, but the message may not be received. In this case, the receiving party needs to actively call the interface of the initiating party to query the business processing result, and the reliability of the notification The key lies in the recipient of the notification.

Solution-wise, best-effort notification requires:

  • Provide an interface, so that the receiving notification can query the business processing results through the interface
  • The message queue ACK mechanism, the message queue gradually increases the notification interval according to the interval of 1min, 5min, 10min, 30min, 1h, 2h, 5h, and 10h, until the upper limit of the time window required by the notification is reached. no further notice

Best-effort notification is applicable to business notification types. For example, the result of WeChat transaction is to notify merchants through best-effort notification. There are both callback notifications and transaction query interfaces.

Seven, AT transaction mode

This is a transaction mode in Seata , an open source project of Alibaba, also known as FMT in Ant Financial. The advantage is that the use of this transaction mode is similar to the XA mode. The business does not need to write various compensation operations, and the rollback is automatically completed by the framework. This mode also has many disadvantages. On the other hand, there are problems such as dirty rollback, which can easily lead to data inconsistency. For a comparative study between AT and XA, you can refer to: XA vs AT

A New Scheme for Distributed Transactions

https://github.com/dtm-labs/dtm After studying various classic solutions, based on the experience of many companies using dtm, new and more convenient and easy-to-use new solutions are proposed to help everyone better and faster to solve the problem of data consistency across libraries and services.

second stage message

dtm pioneered a two-phase message architecture, which is much better than local message tables and transaction messages, and can perfectly replace local message tables and transaction messages.

The working sequence diagram of the two-phase message is as follows:
image.png

Compared with local message tables and transaction messages, two-phase messages have the following advantages:

  • No queues are needed, so no consumers are needed, the user simply calls the API
  • The second-phase message also has a back check, but the back check is automatically processed by the framework, and the data is guaranteed to be correct

For details about the two-phase message, please refer to the two-phase message here

Workflow Mode

The XA, Saga, Tcc and other modes have been introduced above. Each mode has related advantages and disadvantages and is suitable for different businesses. Is there a way to combine their advantages, use different patterns for different businesses, and then fuse them into one global transaction?

The Workflow mode pioneered by dtm can support the mixed use of the above three modes, and also allows the mixed use of HTTP/gRPC/local transactions. It has great flexibility and can solve various business scenarios.

For details about Workflow, please refer to Workflow here

exception handling

Problems such as network and business failures may occur in all aspects of distributed transactions. These problems require the business side of distributed transactions to achieve three characteristics of air defense rollback, idempotency, and anti-suspension.

abnormal situation

These exceptions are illustrated below with TCC transactions:

Empty rollback:

Without calling the Try method of the TCC resource, the two-stage Cancel method is called. The Cancel method needs to recognize that this is an empty rollback, and then directly returns success.

The reason is that when a branch transaction is down in service or the network is abnormal, the branch transaction call is recorded as a failure. At this time, the Try phase is not executed. When the fault is recovered, the distributed transaction is rolled back and the second-phase Cancel is called. method, resulting in an empty rollback.

Idempotent :

Since any request may have network exceptions and repeated requests, all distributed transaction branches need to ensure idempotency

suspension:

Suspension means that for a distributed transaction, the second-phase Cancel interface is executed before the Try interface.

The reason is that when the RPC calls the branch transaction try, the branch transaction is registered first, and then the RPC call is executed. If the network of the RPC call is congested at this time, after the RPC times out, the TM will notify the RM to roll back the distributed transaction, which may be rolled back. After completion, Try's RPC request arrives at the participant for real execution.

Let's look at a sequence diagram of network exceptions to better understand the above problems
image.png

  • When the business is processing request 4, Cancel is executed before Try, and an empty rollback needs to be processed.
  • When the business processes request 6, Cancel is repeatedly executed, which requires idempotency
  • When the business is processing request 8, the Try is executed after Cancel, and the suspension needs to be processed.

In the face of the above-mentioned complex network anomalies, the solutions suggested by various companies are that the business party uses a unique key to query whether the associated operation has been completed, and if it has been completed, it will directly return success. The relevant judgment logic is complex, prone to errors, and has a heavy business burden.

subtransaction barrier

In the project https://github.com/dtm-labs/dtm , a sub-transaction barrier technology appeared. Using this technology, this effect can be achieved. See the schematic diagram:
image.png

After all these requests reach the sub-transaction barrier: abnormal requests will be filtered; normal requests will pass the barrier. After developers use the sub-transaction barrier, all the exceptions mentioned above are properly handled, and business developers only need to pay attention to the actual business logic, which greatly reduces the burden.
The sub-transaction barrier provides the method CallWithDB, and the prototype of the method is:

 func (bb *BranchBarrier) CallWithDB(db *sql.DB, busiCall BusiFunc) error

Business developers write their own logic in busiCall and call this function. CallWithDB guarantees that busiCall will not be called in scenarios such as empty rollback and suspension; when the business is repeatedly called, there is idempotent control, which is guaranteed to be submitted only once.

Sub-transaction barriers will manage TCC, SAGA, etc., and can also be extended to other areas

Subtransaction barrier principle

The principle of sub-transaction barrier technology is to establish a branch transaction status table sub_trans_barrier in the local database, and the unique key is global transaction id-sub-transaction id-sub-transaction branch name (try|confirm|cancel)

  1. Start local transaction
  2. For the current operation op (try|confirm|cancel), insert ignore a piece of data gid-branchid-op, if the insertion is unsuccessful, the commit transaction returns success (common idempotent control method)
  3. If the current operation is cancel, then insert ignore a piece of data gid-branchid-try, if the insertion is successful (note that it is successful), the commit transaction returns success
  4. Call the business logic in the barrier, if the business returns success, the commit transaction returns success; if the business returns failure, the rollback transaction returns failure

Under this mechanism, problems related to network anomalies are solved

  • Empty compensation control--If Try is not executed and Cancel is executed directly, then Cancel will be successfully inserted into gid-branchid-try, and the logic in the barrier will not be followed, ensuring empty compensation control
  • Idempotent control - any branch cannot repeatedly insert a unique key, ensuring no repeated execution
  • Anti-hanging control--Try is executed after Cancel, if the inserted gid-branchid-try is unsuccessful, it will not be executed, ensuring anti-hanging control

A similar mechanism is also used for SAGA et al.

Subtransaction barrier summary

The sub-transaction barrier technology is pioneered at https://github.com/dtm-labs/dtm . Its significance lies in designing simple and easy-to-implement algorithms and providing an easy-to-use interface. In the first creation, its significance lies in the design of simple and easy-to-use interfaces. The implemented algorithm provides a simple and easy-to-use interface. With the help of these two items, developers are completely freed from handling network exceptions.

This technology currently needs to be paired with the dtm-labs/dtm transaction manager, and the SDK has been provided to developers of Go, Python, C#, and Java languages. SDKs for other languages are planned. For other distributed transaction frameworks, as long as appropriate distributed transaction information is provided, the technology can be quickly implemented according to the above principles.

dtm not only implements sub-transaction barriers based on SQL databases, but also sub-transaction barriers based on Redis and Mongo, so it can combine Redis, Mongo, SQL databases, and other storage engines that support transactions to form a global transaction, providing a very large flexibility.

Distributed Transaction Practice

We also have many articles that will take you quickly to develop a distributed transaction through practical examples, including versions in various languages. If you are interested, you can visit: dtm tutorial

Summarize

This article introduces some basic theories of distributed transactions, and explains the commonly used distributed transaction schemes; in the second half of the article, the causes, classifications and elegant solutions of transaction exceptions are also given; The distributed transaction example of , demonstrates the content introduced earlier in a short program.

dtm-labs/dtm supports TCC, XA, SAGA, two-stage message, best-effort notification (two-stage message), and provides HTTP and gRPC protocol support, which is very easy to access.

dtm-labs/dtm has supported clients in languages such as Python, Java, PHP, C#, Node, etc., see: SDK for each language .

Welcome everyone to visit the https://github.com/dtm-labs/dtm project and give a star to support!


叶东富
1.1k 声望6.1k 粉丝

27

引用和评论

27 条评论
头像
阿篮

二阶段提交 怎么保证第二阶段的转入操作一定成功?如果转入失败 二阶段就不成立了吗?

2023-06-08 来自中国
朱帅

@阿篮 同样的疑问

2024-10-16
头像
行云流水

子事务屏障,正好能解决困扰我们很久的问题

2021-07-09
头像
一只码农

终于找到一篇深入而又全面的分布式事务文章了

2021-07-12
头像
一只程序猿

最全的方案合集👍

2021-07-16
头像
进击999

很全面,很棒

2021-07-24
头像
码农大妈

感谢大佬,太实用了

2021-07-24
头像
i一切随风

dtm事务管理器是集成在sdk中的吧, 他负责管理,如果应用程序(主程序)中断或者说异常宕机了,怎么处理呢

2021-09-08
叶东富(作者)

@i一切随风 dtm有服务端和客户端SDK,当应用程序宕机,dtm会根据全局事务的具体情况,进行重试或者回滚,保证数据的一致性

2021-09-09
头像
Gilmourish

好文

2021-09-26
头像
youth7

怎么感觉最大努力通知和分布式事务没啥关联?

2021-10-25
叶东富(作者)

@youth7 以微信支付的接入做为最大努力通知的例子,整个订单包括:

  1. 自己内部的订单创建,库存扣减
  2. 调起微信支付,用户在微信平台上成功支付,微信支付把支付成功的消息通知到自己
  3. 收到支付成功的回调通知后,标记订单支付成功,并进行发货等

整个订单包括了微信的支付系统和内部的订单系统,总体的协作来看是一个分布式事务。

从微信的支付系统的实现角度看,需要在订单支付成功后,通知第三方处理,并且具备重试等功能,那么就是一个典型的本地消息/事务消息技术架构

2021-10-25
头像
Java识堂

事务消息这个图不对把?和描述不一致

2021-10-29
叶东富(作者)

@Java识堂 具体哪里不对?

2021-10-29
头像
woo

感谢大佬,总结的很全。
有个小小的问题想请教一下。用Rocket MQ事务功能时,
问题一 ,只要扣减余额成功就发消息5的话,那完成1到4直接RPC代替五不行吗?
问题二,发送消息(5)后,增加余额事务一直失败,已经COMMIT成功的扣减余额事务可以回退吗?
期待回复

2021-11-30
叶东富(作者)

@woo 问题一:没太明白你的替代方案是什么样;5不是发送消息,而是处理消息
问题二:如果增加余额会失败,那么不建议采用事务消息方案,建议采用Saga或者TCC,因为消息机制中,没有处理回滚补偿的机制。

2021-11-30
头像
Leder

很好,很强

2022-01-23
头像
囧囧有神

二阶段消息时序图第一列,本地事务提交,对应的例子是不是就是transOut这个操作?

2022-04-06
2022-04-06
头像
囧囧有神

看最新版的代码,在二阶段消息这里,本地事务提交,如果提交出现问题,为什么需要反查queryprepared?

2022-04-06
叶东富(作者)

@囧囧有神 如果没有反差,那么dtm如何知道是否提交成功了。进程可能在提交前也可能在提交后崩溃

2022-04-06
囧囧有神

@叶东富 再请教下,这个场景下的transOut是不是与barrier 的db在一个下面?这样才能做到本地事务和屏障等一起提交?

2022-04-07
叶东富(作者)

@囧囧有神 要在同一个数据库实例里面,能够在一个事务内部对这两个表进行修改

2022-04-07
头像
ethan

关于二阶消息 我有两个疑问,
1."如果回查时,数据库的事务还在进行中,那么插入操作就会被进行中的事务阻塞,因为插入操作会等待事务中持有的锁"。那如果回查的时候 业务的事物还没开启的话(比如提交完prepare消息后,开启业务事务之前,业务进程阻塞了,然后dtm server认为超时 进行回查) ,回查就会正常插入,导致认为进行中的事务 已经回滚。这会不会有什么问题?我的分析是,整个全局事务会失败,然后业务事务插入失败,回滚。好像也不会有什么影响?
2.回查时插入的数据 需要怎么删除?运维监控 然后异步定期删除?

2022-05-13
叶东富(作者)

@ethan 您的思考非常深入

  1. 回查的时候,本地事务还没有开启的话,那么回查的插入成功,而业务的插入会失败,走到回滚。这是一种极端的case,dtm对此已经进行了妥善处理,
  2. 回查时插入的数据,一般是由DBA设置定期任务,定时删除
2022-06-22