Workflow mode is the first mode introduced by github.com/dtm-labs/dtm . In this mode, XA, SAGA, TCC, HTTP and gRPC can be mixed, and users can use most of the distributed transactions. The content is customized and has great flexibility. Let's use the transfer scenario to describe how to implement it under Workflow.

workflow example

In Workflow mode, either HTTP protocol or gRPC protocol can be used. The following takes gRPC protocol as an example, which is divided into the following steps:

  • Initialize SDK
  • Register workflow
  • Execute workflow

First, you need to initialize the SDK Workflow before using the workflow:

 import     "github.com/dtm-labs/dtmgrpc/workflow"

// 初始化workflow SDK,三个参数分别为:
// 第一个参数,dtm服务器地址
// 第二个参数,业务服务器地址
// 第三个参数,grpcServer
// workflow的需要从"业务服务器地址"+"grpcServer"上接收dtm服务器的回调
workflow.InitGrpc(dtmGrpcServer, busi.BusiGrpc, gsvr)

Then you need to register the workflow handler

 wfName := "wf_saga"
err := workflow.Register(wfName, func(wf *workflow.Workflow, data []byte) error {
  req := MustUnmarshalReqGrpc(data)
  wf.NewBranch().OnRollback(func(bb *dtmcli.BranchBarrier) error {
    _, err := busi.BusiCli.TransOutRevert(wf.Context, req)
    return err
  })
  _, err := busi.BusiCli.TransOut(wf.Context, req)
  if err != nil {
    return err
  }
  wf.NewBranch().OnRollback(func(bb *dtmcli.BranchBarrier) error {
    _, err := busi.BusiCli.TransInRevert(wf.Context, req)
    return err
  })
  _, err = busi.BusiCli.TransIn(wf.Context, req)
  return err
})
  • This registration operation needs to be performed after the business service is started, because when the process crashes, dtm will call back the business server and continue the unfinished task.
  • The above code NewBranch will create a transaction branch, a branch will include a forward operation, and a callback when the global transaction commits/rolls back
  • OnRollback/OnCommit will specify the callback for global transaction rollback/commit for the current transaction branch. In the above code, only OnRollback is specified, which belongs to Saga mode
  • Here busi.BusiCli need to add a workflow interceptor, which will automatically record the rpc request result to dtm, as shown below

     conn1, err := grpc.Dial(busi.BusiGrpc, grpc.WithUnaryInterceptor(workflow.Interceptor), nossl)
    busi.BusiCli = busi.NewBusiClient(conn1)

Of course, you can also add workflow.Interceptor to all gRPC clients, this middleware will only process requests under wf.Context and wf.NewBranchContext()

  • When the workflow function returns nil/ErrFailure, the global transaction will enter the Commit/Rollback phase, and the operations registered by OnCommit/OnRollback inside the function will be called in reverse order.

Finally, execute the workflow

 req := &busi.ReqGrpc{Amount: 30}
err = workflow.Execute(wfName, shortuuid.New(), dtmgimp.MustProtoMarshal(req))
  • When the result of Execute is nil/ErrFailure , the global transaction has succeeded/rolled back.
  • When the result of Execute is another value, the dtm server will call back the workflow task to retry.

workflow principle

How does workflow ensure the data consistency of distributed transactions? When the business process crashes and other problems, the dtm server will find that the workflow global transaction timeout has not been completed, then the dtm will use the exponential avoidance strategy to retry the workflow transaction. When the workflow retry request arrives at the business service, the SDK will read the progress of the global transaction from the dtm server. For the completed branch, it will directly return the branch result with the previously saved result through interceptors such as gRPC/HTTP. The final workflow will be completed successfully.

The workflow function needs to be idempotent, that is, the first call, or subsequent retry, should obtain the same result

Saga under Workflow

The Saga mode is derived from this paper SAGAS . Its core idea is to split a long transaction into multiple short transactions, which are coordinated by the Saga transaction coordinator. If each short transaction is successfully submitted and completed, then the global transaction is completed normally. If If a step fails, the compensation operations are called one at a time in reverse order.

In Workflow mode, you can directly call the function of forward operation in the function, and then write the compensation operation to the branch OnRollback , then the compensation operation will be automatically called, achieving the effect of Saga mode

Tcc under Workflow

The Tcc mode is derived from this paper Life beyond Distributed Transactions:an Apostate's Opinion , which divides a large transaction into multiple small transactions, each of which has three operations:

  • Try phase: try to execute, complete all business checks (consistency), reserve necessary business resources (quasi-isolation)
  • Confirm stage: If the Try of all branches is successful, go to the Confirm stage. Confirm actually executes the business, does not perform any business checks, and only uses the business resources reserved in the Try phase
  • Cancel stage: If one of the Try of all branches fails, go to the Cancel stage. Cancel releases the business resources reserved in the Try phase.

For our scenario of inter-bank transfer from A to B, if SAGA is used to adjust the balance in the forward operation, and in the compensation operation, the balance is adjusted in the reverse direction, then the following situations will occur:

  • A deduction is successful
  • A sees the decrease in the balance and tells B
  • The transfer of the amount to B fails, and the entire transaction is rolled back
  • B has been unable to receive the funds

This has brought great trouble to both sides of AB. This situation cannot be avoided in SAGA, but it can be solved by TCC, the design techniques are as follows:

  • In addition to the balance field in the account, a trading_balance field is introduced
  • In the Try phase, check whether the account is frozen, check whether the account balance is sufficient, and adjust the trading_balance (that is, the frozen funds in the business) when there is no problem.
  • In the Confirm stage, adjust the balance and adjust the trading_balance (that is, the unfrozen funds in the business)
  • In the Cancel stage, adjust trading_balance (that is, unfreeze funds in business)

In this case, once end user A sees that his balance has been deducted, then B must be able to receive the funds

在Workflow模式下,您可以在函数中, Try操作, Confirm OnCommit ,将Cancel operation is written to the branch OnRollback , which achieves the effect of the Tcc mode

XA under Workflow

XA is a specification of distributed transactions proposed by the X/Open organization. The XA specification mainly defines the interface between the (global) transaction manager (TM) and the (local) resource manager (RM). Local databases such as mysql play the role of RM in XA

XA is divided into two stages:

The first stage (prepare): that is, all participants RM are ready to execute the transaction and lock the required resources. When the participant is ready, report to the TM that it is ready.
The second stage (commit/rollback): When the transaction manager (TM) confirms that all participants (RM) are ready, it sends a commit command to all participants.

At present, the mainstream databases basically support XA transactions, including mysql, oracle, sqlserver, postgre

In Workflow mode, you can call NewBranch().DoXa in the workflow function to open your XA transaction branch.

Mixed use of multiple modes

In the Workflow mode, the above-mentioned Saga, Tcc, and XA are all modes of branch transactions, so some branches can adopt one mode, and other branches can adopt another mode. The flexibility brought by this hybrid mode allows the selection of sub-modes according to the characteristics of the branch transaction, so the following are recommended:

  • XA: If there is no row lock contention in the business, XA can be used. This mode requires less additional development. Commit/Rollback is done automatically by the database. For example, this mode is suitable for creating an order business. Different order locks have different order lines, which have no effect on concurrency; it is not suitable for deducting inventory, because orders involving the same product will compete for the line lock of this product, which will lead to concurrency. Low.
  • Saga: Ordinary businesses that are not suitable for XA can use this mode. This mode requires less additional development than Tcc, and only needs to develop forward operations and compensation operations.
  • Tcc: suitable for high consistency requirements, such as the transfer introduced earlier, this mode requires the most additional development, and needs to be developed including Try/Confirm/Cancel

idempotent requirement

In Workflow mode, when a crash occurs, it will be retried, so each operation is required to support idempotency, that is, the result of the second call and the first call are the same, and the same result is returned. In the business, the database unique key is usually used to achieve idempotency, specifically insert ignore "unique-key" . If the insertion fails, it means that the operation has been completed, and the return is ignored this time; if the insertion is successful, the description This is the first operation, continue with subsequent business operations.

If your business itself is idempotent, you can directly operate your business; if your business provides idempotent functions, then dtm provides BranchBarrier auxiliary class, based on the unique-key principle above , which can easily help developers implement idempotent operations in Mysql/Mongo/Redis .

The following two are typical non-idempotent operations, please note:

  • Timeout rollback: If there is an operation in your business that may take a long time, and you want your global transaction to fail and roll back after waiting for a timeout. Then this is not an idempotent operation, because in extreme cases, two processes call the operation at the same time, one returns a timeout failure, and the other returns success, resulting in different results
  • Rollback when the retry limit is reached: The analysis process is the same as above.

Workflow mode does not currently support the above timeout rollback and rollback after the retry reaches the upper limit. If you have relevant scenario requirements, please give us the specific scenario, and we will actively consider whether to add such support.

branch operation result

Branch operations return the following results:

  • Success: branch operation returned HTTP-200/gRPC-nil
  • Business failure: branch operation returns HTTP-409/gRPC-Aborted , no more retry, global transaction needs to be rolled back
  • In progress: The branch operation returns HTTP-425/gRPC-FailPrecondition , this result indicates that the transaction is in normal progress, when dtm is required to retry, do not use the exponential backoff algorithm, but retry at a fixed interval
  • Unknown error: The branch operation returns other results, indicating an unknown error, dtm will retry the workflow, using exponential backoff algorithm

If your existing service differs from the results above, you can customize this part of the results by workflow.Options.HTTPResp2DtmError/GRPCError2DtmError

Saga's compensation operation and Tcc's Confirm/Cancel operation, according to the agreement between Saga and Tcc, are not allowed to return business failures, because at the second stage of the workflow, Commit/Rollback, it is neither successful nor allowed to repeat. If you try, then the global transaction cannot be completed, which should be avoided by developers when designing

Transaction Completion Notification

In some business scenarios, if you want to be notified of transaction completion, this function can be implemented by setting the OnFinish callback on the first transaction branch. When the callback function is called, all business operations have been performed, so the global transaction is essentially complete. The callback function can judge whether the global transaction is finally committed or rolled back based on the incoming isCommit .

One thing to note is that when the OnFinish callback is received, the state of the transaction has not been changed to the final state on the dtm server. Therefore, if the transaction completion notification and the query of the global transaction result are mixed, then the Results may be inconsistent, and users are advised to use only one of the methods and not a combination of them.

performance

In DTM, to normally complete a Workflow transaction, two separate global transaction writes are required (one is to save when Prepare, and the other is to change the status to success), and intermediate transaction progress needs to be saved (after batching this part, the overhead is very small ). Compared with the Saga mode of DTM, there is one less branch transaction to save, and the write volume of branch transactions becomes smaller (successful transactions do not need to save additional branches for compensation), so the performance will be better than that of Saga, detailed test report, Will come out in the future.

The next step of the job

  • Gradually improve workflow examples and documentation
  • Support branch transaction concurrency

contact us

Welcome to visit our project, and star support us:

https://github.com/dtm-labs/dtm

Follow the [Distributed Transaction] public account to get more knowledge about distributed transactions


叶东富
1.1k 声望6.1k 粉丝