How to elegantly implement multi-database outbox mode

Introduction to Outbox Mode

A microservice may need to perform two steps: "save database" and "send events". For example, after publishing an article, it is necessary to update the author's published article statistics. Business requires two operations to fail at the same time, or to succeed at the same time, instead of one success and one failure. If the final article is published and the update of the published article statistics fails, it will lead to data inconsistency.

Outbox mode is the most common mode to solve this problem, and its principle is:

The local business runs as a transaction. Before submitting the transaction, the event is written to the message table; when the transaction is submitted, the business and the event are submitted at the same time.
Send events to the message queue by polling the message table or listening to binlog
- Polling method: every 1s or 0.2s, take out the event in the message table, send it to the message queue, and then delete the event
- Monitoring binlog mode: Through database tools such as Debezium, monitor the binlog of the database, get events, and send them to the message queue
Write consumers, handle events

Since in 1, the submission of the business and the event is in the same transaction, it is guaranteed that the two will be submitted at the same time.
In steps 2 and 3, all operations will not fail. If a downtime event occurs in the middle, it will be retried and eventually succeeded.

For the aforementioned scenario of submitting statistical information after publishing, the above solution ensures that the statistical information is finally updated, and the data will reach final consistency

problem with multiple databases

In today's popular microservice architecture, usually a microservice uses a separate database. When multiple services need to use the outbox mode, the traditional outbox architecture is more difficult to maintain.

Obtaining events by polling: You need to write polling tasks for multiple databases in the polling task
Use monitoring binlog to obtain events: you need to monitor the binlog of multiple databases

The above two methods of acquiring events have poor maintainability in the face of a large number of databases. Moreover, the flexibility of the architecture is not good. If there are many databases and few events generated over time, the architecture will have a high load and waste resources. The ideal architectural load is related only to the number of events sent and nothing else.

solution

The two-phase message in the open source distributed transaction framework https://github.com/dtm-labs/dtm can handle this problem very well. The following is an example of the use of inter-bank transfer business:

 msg := dtmcli.NewMsg(DtmServer, gid).
    Add(busi.Busi+"/TransIn", &TransReq{Amount: 30})
err := msg.DoAndSubmitDB(busi.Busi+"/QueryPreparedB", db, func(tx *sql.Tx) error {
    return busi.SagaAdjustBalance(tx, busi.TransOutUID, -req.Amount, "SUCCESS")
})

in this part of the code

First generate a DTM msg global transaction, pass the dtm server address and global transaction id
Add a branch business logic to msg, the business logic here is the balance transfer operation TransIn, and then bring the data that this service needs to transmit, the amount is 30 yuan
Then call the DoAndSubmitDB of msg, this function ensures the successful execution of the business and the submission of the msg global transaction, either succeeding or failing at the same time
1. The first parameter is the lookback URL, the detailed meaning will be explained later
2. The second parameter is sql.DB, which is the database object accessed by the business
3. The third parameter is the business function. The business in our example is to deduct the balance of 30 yuan from A

success process

How does DoAndSubmitDB ensure the atomicity of successful business execution and msg submission? Please see the timing diagram below:

Under normal circumstances, the five steps in the sequence diagram will be completed normally, the entire business will proceed as expected, and the global transaction will be completed. There is a new content here that needs to be explained, that is, the submission of msg is initiated in two stages. The first stage calls Prepare, and the second stage calls Commit. After DTM receives the Prepare call, it will not call the branch transaction, but Awaiting subsequent Submit. Only after the Submit is received, the branch call is started, and the global transaction is finally completed.

abnormal situation

In a distributed system, all kinds of downtime and network anomalies need to be considered. Let's take a look at the possible problems:

First of all, the most important goal we want to achieve is that the business is successfully executed and the msg transaction is an atomic operation, then if the previous sequence diagram, when the Prepare message is sent successfully, Submit before the message is sent successfully , what is the chance of abnormal downtime? At this time, dtm will detect that the transaction has timed out and will check back. For developers, this checkback is as simple as pasting the following code:

 app.GET(BusiAPI+"/QueryPreparedB", dtmutil.WrapHandler2(func(c *gin.Context) interface{} {
        return MustBarrierFromGin(c).QueryPrepared(dbGet())
    }))

If you're not using the go framework gin, you'll need to make some small modifications to your framework, but the code is generic and suitable for each of your businesses.

The main principle of the back check is mainly through the message table, but the back check of dtm has been carefully demonstrated and can handle the following situations:

When checking back, the local transaction has not started
When checking back, the local transaction is still in progress
On checkback, the local transaction has been rolled back
When checking back, the local transaction has been committed

The detailed back-checking principle is a bit complicated, and a patent has been applied for. It will not be introduced in detail here. For details, please refer to https://dtm.pub/practice/msg.html

Multiple database support

Under this solution, if you need to deal with multiple databases, at the operation and maintenance level, you only need to create a good message table for the corresponding library; at the code level, you only need to pass in different database connections in the place of back-checking.

Compared with the original polling table and monitoring binlog scheme, the operation and maintenance cost is greatly reduced. The load of this architecture is only related to the number of events and has nothing to do with other factors such as the number of databases, so it has better elasticity.

More storage engine support

The two-stage message of dtm not only provides database support DoAndSubmitDB , but also provides NoSQL support

Mongo support

The following code can ensure that both business and messages under Mongo are submitted at the same time

 err := msg.DoAndSubmit(busi.Busi+"/RedisQueryPrepared", func(bb *dtmcli.BranchBarrier) error {
    return bb.MongoCall(MongoGet(), func(sc mongo.SessionContext) error {
        return SagaMongoAdjustBalance(sc, sc.Client(), TransOutUID, -reqFrom(c).Amount, reqFrom(c).TransOutResult)
    })
})

Redis support

The following code can ensure that both the business and the message under Redis are submitted at the same time

 err := msg.DoAndSubmit(busi.Busi+"/RedisQueryPrepared", func(bb *dtmcli.BranchBarrier) error {
    return bb.RedisCheckAdjustAmount(busi.RedisGet(), busi.GetRedisAccountKey(busi.TransOutUID), -30, 86400)
})

dtm's lookback scheme can be easily extended to a variety of other storage engines that support transactions

Program Features

The two-stage message has the following characteristics:

Elegantly supports multiple databases
Not only supports SQL databases, but also supports NoSQL such as Mongo, Redis, etc.
The code is short and the amount of code is greatly reduced compared to the usual outbox mode
The entire architecture and development process does not involve message queues, only APIs, making it easier to get started
The load is only related to the amount of messages, not the number of databases involved

Compare RocketMQ transaction messages

This form of back-checking was first proposed in RocketMQ's transaction messages, but the author searched the entire network for examples of back-checking, as well as various cases, but did not find a back-checking solution that can handle all kinds of abnormal situations. None of the solutions that have been found can correctly handle the situation of "local transaction is still in progress", and there will be extreme cases that lead to data inconsistency. For details, please refer to https://dtm.pub/practice/msg.html .

In addition, the two-stage message of dtm does not need to introduce queues, or can be used in combination with other message queues, so it can be used in a wider range.

summary

The dtm two-stage message introduced in this article better supports the situation of multiple databases. This architecture solution has many advantages and can perfectly replace the outbox mode, bringing developers a simpler and easier-to-use architecture.

Welcome to https://github.com/dtm-labs/dtm and star support us

How to elegantly implement multi-database outbox mode

Introduction to Outbox Mode

problem with multiple databases

solution

success process

abnormal situation

Multiple database support

More storage engine support

Mongo support

Redis support

Program Features

Compare RocketMQ transaction messages

summary

叶东富

引用和评论

支持Saga、Tcc、Xa混用，支持gRPC，HTTP混用的分布式事务模式

Spring-@Configuration注解简析

Mysql：关于查询的那些事儿

单元测试-PowerMock

还在用命令行监控服务器？试试这款监控工具吧，直观又易用！

实现钉钉登录第三方网站功能

springboot初始化数据库+druid解密