go-zero microservice combat series (10. How to implement distributed transactions)

In distributed application scenarios, distributed transaction problems are unavoidable, especially in the current popular microservice scenarios. For example, in our mall system, the ordering operation involves two operations: creating an order and deducting inventory, while the order service and the commodity service are two independent microservices. Because each microservice has an exclusive database instance, placing an order The operation involves a distributed transaction problem, that is, the entire ordering operation should be regarded as a whole, either all successful or not. In this article, we will learn about distributed transactions together.

Eventual Consistency Based on Messages

When we go to the restaurant to eat, the waiter will often give us a receipt after paying and ordering, and then take the receipt to the outlet to wait for the meal. Why separate the two actions of paying and picking up food? A very important reason is to make their customer reception ability stronger, corresponding to the service, it is to make the concurrent processing ability stronger. As long as we have the receipt, we can finally get the meal we ordered, relying on the receipt (message) to achieve eventual consistency.

Corresponding to our ordering operation, when the user places an order, we can generate an order first, and then send a message to deduct the inventory to the message queue. At this time, the order is completed, but the inventory has not actually been deducted, because Inventory deduction and order placement are asynchronous, that is, data inconsistency occurs at this time. When the inventory deduction message is consumed, the inventory deduction operation is performed, and the data achieves eventual consistency at this time.

The strategy of implementing eventual consistency based on messages is suitable for scenarios with high concurrency and low requirements for data consistency. Some non-backbone logic in our mall can use this method to improve throughput. For example, non-core logic such as obtaining coupons after purchasing a product does not require strong data consistency, and can issue coupons to users asynchronously.

What if the operation fails after consuming the message? First, you need to retry. If it fails after retries for many times, an alarm or log needs to be issued at this time, and manual intervention is required.

If there is a strong consistency requirement for data, then this method is not applicable. Please see the two-phase commit protocol below.

XA protocol

Speaking of the XA protocol, you may not have heard of this term, but you must have heard of 2PC. This solution relies on the support of the underlying database. The DB layer must first implement the XA protocol. For example, MySQL InnoDB is a database solution that supports the XA protocol. XA can be understood as a strongly consistent centralized atomic commit protocol .

The concept of atomicity is to combine a series of operations into a whole, either all of them are executed or none of them are executed. The so-called 2PC is to divide a transaction into two steps to submit, the first step is the preparation action, and the second step is the commit/rollback. The coordination between these two steps is managed by a centralized Coordinator to ensure multiple steps. Atomicity of operations.

The first step (Prepare) : The Coordinator issues a Prepare command to the participants of each distributed transaction. Each transaction executes the SQL statement in the database but does not submit it, and reports the ready state to the Coordinator.

Step 2 (Commit/Rollback) : If all nodes are ready, the Coordinator will issue a Commit command, and each participant submits local transactions. If any node cannot be ready, the Coordinator will issue a Rollback command to perform a local rollback.

In our ordering operation, we need to create an order and the product needs to deduct the inventory. Next, let's see how 2PC solves this problem. 2PC introduces a transaction coordinator role to coordinate orders and merchandise services. The so-called two-stage refers to the preparation stage and the submission stage. In the preparation stage, the coordinator sends a preparation command to the order service and the commodity service respectively. After the order and commodity service receive the preparation command, they start to perform the preparation operation. What needs to be done in the preparation stage Woolen cloth? You can understand that all work other than committing database transactions must be done in the preparation phase. For example, the order service needs to be completed in the preparation stage:

Start a database transaction in the order library;
Write order data in the order table

Note that here we did not submit the order database transaction, and finally returned the preparation success to the bookstore coordinator. The coordinator begins to enter the second phase after receiving the response that the two services are ready. Entering the submission stage, the submission stage is relatively simple. The coordinator sends a submission command to the two systems. Each system submits its own database transaction and then returns a successful submission response to the coordinator. After the coordinator receives a response, it sends it to the client. The terminal returns a successful response, and the entire distributed transaction ends. The following is the sequence diagram of this process:

The above is the normal situation, and the next step is the key point. What should I do if there is an abnormal situation? We still explain in two stages. In the preparation stage, if any asynchronous error or timeout occurs, the coordinator will send a rollback transaction request to the two services, and the two services will roll back their own database transactions after receiving the request. The execution of the distributed transaction fails, the database transactions of the two services are rolled back, and all related data are rolled back to the state before the distributed transaction is executed, as if the distributed transaction was not executed. The following is the sequence diagram of the abnormal situation:

If the preparation phase is successful and the commit phase is entered, the entire distributed transaction can only succeed, not fail . If the network transmission fails, it needs to be retried repeatedly until the submission is successful. If there is a downtime at this stage, including the downtime of the two databases or the downtime of the order service and the commodity service, the order library may still be submitted, but The commodity library is automatically rolled back due to downtime, resulting in inconsistent data. However, because the submission process is very simple and the execution is very fast, the probability of this situation is relatively low. Therefore, from a practical point of view, the distribution of 2PC The actual data consistency is still very good.

However, this kind of distributed transaction has a natural defect, which makes XA especially unsuitable for use in high-concurrency scenarios on the Internet, because each local transaction in the Prepare phase will always occupy a database connection resource, which will not be used until the second phase. It will be released after Commit or Rollback. But what are the characteristics of the Internet scene? Is high concurrency, because the amount of concurrency is particularly high, so each transaction must release the database connection resources it holds as soon as possible. The shorter the transaction execution time, the better, so that other transactions can be executed as soon as possible.

Therefore, 2PC is only considered in scenarios where strong consistency is required and the amount of concurrency is not large .

2PC also has some improved versions, such as 3PC. The general idea is similar to that of 2PC. It solves some problems of 2PC, but it also brings new problems and is more complicated to implement. Due to space limitations, we cannot go into detail about each. Explanation, on the basis of understanding 2PC, you can search for relevant materials to learn by yourself.

Distributed Transaction Framework

It is still relatively complicated to implement a relatively complete set of distributed transaction logic without bugs. Fortunately, we do not need to reinvent the wheel. There are already some ready-made frameworks that can help us realize distributed transactions. Here we mainly introduce the use and go- Zero is combined with a better DTM.

Referring to the introduction of the DTM official website, DTM is a revolutionary distributed transaction framework, which provides a fool-like way of use, greatly reduces the threshold for the use of distributed transactions, and changes the "no need for distributed transactions" "The industry status quo elegantly solves the problem of data consistency between services.

The author of this article has heard of DTM before writing this article, but has never used it. It took about ten minutes to read the official documentation, and then he can use it like a gourd, which is enough to show that the use of DTM is very simple. Yes, I believe you who are smart will definitely know it at a glance. Next, we will use DTM to implement distributed transactions based on TCC.

First of all, you need to install dtm. I am using a mac and install it directly using the following command:

 brew install dtm

Create a configuration file dtm.yml for DTM with the following contents:

 MicroService:
  Driver: 'dtm-driver-gozero' # 配置dtm使用go-zero的微服务协议
  Target: 'etcd://localhost:2379/dtmservice' # 把dtm注册到etcd的这个地址
  EndPoint: 'localhost:36790' # dtm的本地地址

 # 启动dtm
dtm -c /opt/homebrew/etc/dtm.yml

After consuming the order data in seckill-rmq, place an order and deduct inventory. Here, it is changed to a distributed transaction method based on TCC. Note that the dtmServer corresponds to the Target in the DTM configuration file:

 var dtmServer = "etcd://localhost:2379/dtmservice"

Since TCC consists of three parts, namely Try, Confirm and Cancel, we provide corresponding RPC methods for these three stages in order service and commodity service respectively.

In the method corresponding to Try, some data Check operations are mainly performed. After the Check data meets the order requirements, the method corresponding to Confirm is executed. The method corresponding to Confirm is the real implementation of business logic. If the rollback fails, the method corresponding to Cancel is executed. The Cancel method mainly compensates the data of the Confirm method. code show as below:

 var dtmServer = "etcd://localhost:2379/dtmservice"

func (s *Service) consumeDTM(ch chan *KafkaData) {
  defer s.waiter.Done()

  productServer, err := s.c.ProductRPC.BuildTarget()
  if err != nil {
    log.Fatalf("s.c.ProductRPC.BuildTarget error: %v", err)
  }
  orderServer, err := s.c.OrderRPC.BuildTarget()
  if err != nil {
    log.Fatalf("s.c.OrderRPC.BuildTarget error: %v", err)
  }

  for {
    m, ok := <-ch
    if !ok {
      log.Fatal("seckill rmq exit")
    }
    fmt.Printf("consume msg: %+v\n", m)

    gid := dtmgrpc.MustGenGid(dtmServer)
    err := dtmgrpc.TccGlobalTransaction(dtmServer, gid, func(tcc *dtmgrpc.TccGrpc) error {
      if e := tcc.CallBranch(
        &product.UpdateProductStockRequest{ProductId: m.Pid, Num: 1},
        productServer+"/product.Product/CheckProductStock",
        productServer+"/product.Product/UpdateProductStock",
        productServer+"/product.Product/RollbackProductStock",
        &product.UpdateProductStockRequest{}); err != nil {
        logx.Errorf("tcc.CallBranch server: %s error: %v", productServer, err)
        return e
      }
      if e := tcc.CallBranch(
        &order.CreateOrderRequest{Uid: m.Uid, Pid: m.Pid},
        orderServer+"/order.Order/CreateOrderCheck",
        orderServer+"/order.Order/CreateOrder",
        orderServer+"/order.Order/RollbackOrder",
        &order.CreateOrderResponse{},
      ); err != nil {
        logx.Errorf("tcc.CallBranch server: %s error: %v", orderServer, err)
        return e
      }
      return nil
    })
    logger.FatalIfError(err)
  }
}

concluding remarks

This article mainly learns the knowledge related to distributed transactions with you. In scenarios with high concurrency and no strong data consistency requirements, we can implement distributed transactions to achieve eventual consistency through message queues. If there are strong data consistency requirements, 2PC can be used, but the data is strongly consistent. Guarantee will inevitably lose performance, so 2PC is generally only used when the amount of concurrency is not large and there is a strong requirement for data consistency. 3PC, TCC, etc. are optimized for some shortcomings of 2PC. Due to space limitations, they are not detailed here. Interested friends can search for relevant materials to learn. Finally, an example of a distributed transaction in the ordering process is completed based on TCC using DTM, and the code implementation is also very simple and easy to understand. For distributed transactions, I hope everyone can understand the principle first. After understanding the principle, no matter what framework is used, it does not matter.

Hope this article is helpful to you, thank you.

Updated every Monday and Thursday

Code repository: https://github.com/zhoushuguang/lebron

project address

https://github.com/zeromicro/go-zero

Welcome go-zero and star support us!

WeChat exchange group

Follow the official account of " Microservice Practice " and click on the exchange group to get the QR code of the community group.

go-zero microservice combat series (10. How to implement distributed transactions)

Eventual Consistency Based on Messages

XA protocol

Distributed Transaction Framework

concluding remarks

project address

WeChat exchange group

kevinwan

引用和评论

熔断原理分析与源码解读

一文掌握 MCP 上下文协议：从理论到实践

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

大模型时代，后端程序员如何避免被AI卷死？

Go slice切片使用教程，一次通关！

Go-Zero实战：抽奖算法的设计与实现

腾讯 tRPC-Go 教学——（1）搭建服务