The first open source solution that completely guarantees cache and database consistency

Overview

In a large number of practical projects, Redis cache will be introduced to relieve the query pressure of the database. At this time, since a data is stored in Redis and the database, there will be a problem of data consistency. At present, the industry has not seen a mature solution that can ensure eventual consistency, especially when the following scenarios occur, it will directly cause inconsistency between the cached data and the database data, which may cause major problems to the application.

dtm-labs is committed to solving the problem of data consistency. After analyzing the existing practices in the industry, a new solution, dtm-labs/dtm + dtm-labs/rockscache , is proposed, which completely solves the above problems. In addition, as a mature solution, this solution can also prevent cache penetration, cache breakdown, and cache avalanche, and can also be applied to scenarios that require strong data consistency.

Regarding the existing solutions for managing caches, this article will not repeat them. Students who do not know much can refer to the following two articles

This is easy to understand: talk about database and cache data consistency issues
This article is more in-depth: Ctrip's eventually consistent and strongly consistent cache practice

out-of-order inconsistency

In the above sequence diagram, since the process of service 1 is suspended (for example, due to GC), when it writes v1 to the cache, it overwrites v2 in the cache, resulting in the final inconsistency (v2 in the DB, v1 in cache).

How to solve such problems as above? Currently, none of the existing solutions have completely solved this problem, and generally they use a slightly shorter expiration time to get to the bottom of it. The cache deferred deletion scheme we implemented can completely solve this problem and ensure that the data between the cache and the database is consistent. The solution is as follows:

The data in the cache is a hash with the following fields:

value: the data itself
lockUtil: Data lock expiration time. When a process queries the cache without data, it locks the cache for a short period of time, then queries the DB, and then updates the cache
owner: data locker uuid

When querying the cache:

If the data is empty and locked, after 1s sleep, re-query
If the data is empty and not locked, execute "fetch data" synchronously and return the result
If the data is not empty, return the result immediately and execute "fetch data" asynchronously

The operation of "fetching data" is defined as:

To determine whether the cache needs to be updated, if one of the following two conditions is satisfied, the cache needs to be updated
- Data is empty and not locked
- The lock on the data has expired
If an update is required, lock the cache, query the DB, verify that the lock holder has not changed, write to the cache, and unlock the cache

When the DB data is updated, ensure that the data update is successful through dtm, and delay the deletion of the cache (will be explained in detail in the following section)

Delayed deletion will set the data expiration time to 10s, set the lock to expired, and trigger the "fetch data" when querying the cache next time.

Under the above strategy:
If the last version written to the database is Vi, the last version written to the cache is V, and the uuid written to V is uuidv, then there must be the following sequence of events:

Database write Vi -> cache data is marked for deletion -> a query locks data and writes to uuidv -> query database result V -> the locker in the cache is uuidv, write result V

In this sequence, the read of V occurs after the write of Vi, so V is equal to Vi, ensuring eventual consistency of the cached data.

dtm-labs/rockscache has implemented the above method to ensure the eventual consistency of cached data.

Fetch The function implements the previous query cache
DelayDelete function implements the delayed deletion logic

Interested students can refer to dtm-cases/cache , which has detailed examples

Atomicity of DB and Cache Operations

For cache management, the industry generally adopts the strategy of deleting/updating cached data after writing to the database. Since the two operations of saving to the cache and saving to the database are not atomic, there must be a time difference, so there will be an inconsistent time window between the two data. Usually, this window is not large and the impact is small. However, downtime may occur between the two, and various network errors may occur, so it may happen that one of them is completed, but the other is not completed, resulting in data inconsistency for a long time.

Take a scenario to illustrate the above inconsistency. The data user modifies data A to B. After the application modifies the database, it deletes/updates the cache. If no exception occurs, then the data in the database and the cache are consistent and there is no problem. However, in a distributed system, events such as process crash and downtime may occur. Therefore, if a process crash occurs when the database is updated and the cache has not been deleted/updated, the database and cached data may be inconsistent for a long time.

In the face of the long-term inconsistency here, it is not an easy task to completely solve it. We will introduce the solutions in various application situations below.

Option 1: Shorter cache time

This solution is the simplest solution and is suitable for applications with low concurrency. If the concurrency of the application is not high, then the entire cache system only needs to set a short cache time, such as one minute. In this case, the load that the database needs to bear is: about every minute, all the cached data that is accessed needs to be generated once. This strategy is feasible when the amount of concurrency is not large.

The above strategy is very simple, easy to understand and implement. The semantics provided by the cache system is that in most cases, the time window for inconsistency between the cache and the database is very short, and in the case of a low probability of process crash, the inconsistency The time window will be up to one minute.

Under the above constraints, data with low consistency requirements needs to be read from the cache; while reads with high consistency requirements are directly queried from the database without going through the cache.

Option 2: The message queue is guaranteed to be consistent

If the concurrency of the application is high, the cache expiration time needs to be longer than one minute, and a large number of requests in the application cannot tolerate long-term inconsistency, then at this time, the cache can be updated by using a message queue. The specific approach is:

When the database is updated, the updated cached message is written to the local table at the same time, and is submitted with the submission of the database update operation.
Write a polling task to continuously poll this part of the message and send it to the message queue.
Consume messages in the message queue, update/delete the cache

This approach can ensure that after the database is updated, the cache will be updated. However, this kind of architecture scheme is very heavy, and the development and maintenance costs of these parts are not low: the maintenance of message queues; the development and maintenance of efficient polling tasks.

Option 3: Subscribe to binlog

The applicable scenario of this scheme is very similar to that of scheme 2, and the principle is similar to that of the master-slave synchronization of the database. The master-slave synchronization of the database is to apply the update of the master library to the slave library by subscribing to the binlog, and this scheme is to subscribe to the binlog, Apply database updates to the cache. The specific method is:

Deploy and configure Alibaba's open source canal to subscribe to the binlog of the database
Monitor data updates through tools such as canal, and update/delete the cache synchronously

This solution can also ensure that the cache will be updated after the database is updated, but this architecture solution is very heavy like the previous message queue solution. On the one hand, the learning and maintenance cost of canal is not low. On the other hand, developers may only need a small amount of data to update the cache. Doing this by subscribing to all binlogs wastes a lot of resources.

Scheme 4: dtm two-phase message scheme

The two-stage message mode in dtm is very suitable for updating/deleting the cache after modifying the database here. The main code is as follows:

 msg := dtmcli.NewMsg(DtmServer, gid).
    Add(busi.Busi+"/UpdateRedis", &Req{Key: key1})
err := msg.DoAndSubmitDB(busi.Busi+"/QueryPrepared", db, func(tx *sql.Tx) error {
  // update db data with key1
})

In this code, DoAndSubmitDB will perform local database operations and modify database data. After the modification is completed, a two-stage message transaction will be submitted, and the message transaction will asynchronously call UpdateRedis. If the process crash event occurs immediately after the local transaction is executed, then dtm will check back and call QueryPrepared to ensure that UpdateRedis will be successfully executed at least once if the local transaction is successfully submitted.

The logic of the back check is very simple, you only need to copy the code like the following:

 app.GET(BusiAPI+"/QueryPrepared", dtmutil.WrapHandler(func(c *gin.Context) interface{} {
        return MustBarrierFromGin(c).QueryPrepared(dbGet())
    }))

Advantages of this scheme:

The solution is simple and easy to use, and the code is short and easy to read
dtm itself is a stateless common application, and the dependent storage engine redis/mysql is a common infrastructure, which does not require additional maintenance of message queues or canal
The related operations are modular and easy to maintain, and there is no need to write consumer logic elsewhere like message queues or canal

Delay from library

In the above solution, it is assumed that after the cache is deleted, the service performs data query and can always find the latest data. However, in the actual production environment, there may be a master-slave separation architecture, and the master-slave delay is not a controllable variable, so what should we do at this time?

There are two processing solutions: one is to distinguish cached data with high final consistency and low final consistency. When querying data, the data with high requirements must be read from the master database, and the data with low requirements must be read from the slave database. For applications using rockscache, high concurrent requests will be intercepted at the Redis layer. For a data, at most one request will reach the database, so the load of the database has been greatly reduced. Using the main library to read is a practical solutions.

Another solution is that the master-slave separation requires a non-forked single-chain architecture, then the slave library at the end of the chain must be the slave library with the longest delay. In this case, the binlog monitoring scheme is adopted, and the monitoring chain needs to be used as the slave library at the end. binlog, when a data change notification is received, the cache is marked for delayed deletion according to the above scheme.

These two schemes have their own advantages and disadvantages, and businesses can adopt them according to their own characteristics.

Anti-cache breakdown

rockscache also prevents cache breakdown. When the data changes, the existing practice in the industry can either choose to update the cache or delete the cache, each with its own advantages and disadvantages. And delayed deletion combines the advantages of the two methods and overcomes the disadvantages of the two methods:

refresh cache

If the update cache strategy is adopted, a cache will be generated for all DB data updates without distinguishing between hot and cold data, so there will be the following problems:

In memory, even if a data is not read, it will be stored in the cache, wasting precious memory resources;
In terms of computing, even if a piece of data is not read, it may be calculated multiple times due to multiple updates, wasting valuable computing resources.
The probability of the above-mentioned out-of-order inconsistency will be high, and it may be triggered when there is a delay in two adjacent updates.

delete cache

Because the previous update cache practice has many problems, most practices adopt the delete cache strategy, and then generate the cache on demand when querying. This approach solves the problem in the update cache, but introduces new problems:

Then in the case of high concurrency, if a hot data is deleted, a large number of requests will fail to hit the cache at this time, resulting in cache breakdown.

In order to prevent cache breakdown, the common practice is to use distributed Redis locks to ensure that only one request goes to the database, and after the cache is generated, other requests are shared. This solution can be suitable for many scenarios, but some scenarios are not suitable.

For example, if there is an important hot data, the calculation cost is relatively high, and it takes 3s to get the result, then after the above scheme deletes one such hot data, at this moment, there will be a large number of requests for 3s to return the result. On the one hand, it may cause A large number of requests time out. On the other hand, if the connection is not released within 3s, the number of concurrent connections will suddenly increase, which may cause system instability.
In addition, when using Redis lock, the part of users who have not obtained the lock will usually poll regularly, and this sleep time is not easy to set. If you set a relatively large sleep time of 1s, the result cached data is calculated for 10ms, and the return is too slow; if the sleep time is set too short, it will consume a lot of CPU and Redis performance

Coping Strategies for Delayed Deletion

The delayed deletion method implemented by dtm-labs/rockscache described above is also a deletion method, but it completely solves the problem of breakdown in the deletion cache and the incidental problems caused by breakdown.

Cache breakdown problem: In the delayed deletion method, if the data in the cache does not exist, the data in the cache will be locked, thus avoiding multiple requests hitting the back-end database.
The above-mentioned large number of requests take 3s to return data, and the problem of timing polling does not exist in delayed deletion, because when hotspot data is delayed and deleted, the old version of the data is still in the cache and will be returned immediately without waiting.

Let's take a look at how the delayed deletion method performs under different data access frequencies:

Hotspot data, 1K qps per second, and the calculation cache time is 5ms. At this time, the delayed deletion method will return outdated data in about 5~8ms, and update the DB first, and then update the cache, because it takes time to update the cache. There is about 0~3ms to return expired data, so there is not much difference between the two.
Hotspot data, 1K qps per second, and the calculation cache time is 3s. At this time, the delayed deletion method will return expired data in about 3s. It is usually better behavior to return the old data than to wait 3s before returning the data.
Ordinary data, 50 qps per second, and the calculation cache time is 1s. At this time, the behavior analysis of the delayed deletion method is similar to 2, and there is no problem.
Low-frequency data is accessed once every 5 seconds, and the cache time is calculated for 3 seconds. At this time, the behavior of the delayed deletion method is basically the same as that of the cache deletion strategy, and there is no problem.
Cold data is accessed once every 10 minutes. At this time, the delayed deletion method is basically the same as the cache deletion strategy, except that the data is saved for 10s more than the cache deletion method, and the space is not large, so there is no problem.

There is an extreme case, that is, there is no data in the original cache, and suddenly a large number of requests arrive. In this scenario, the update cache method to delete the cache method and the delayed deletion method are not friendly. This kind of scenario is something that developers need to avoid. It needs to be solved by preheating, and should not be thrown directly to the cache system. Of course, since the delayed deletion method has minimized the number of requests hitting the database, the performance is not weaker than any other scheme.

Anti-cache penetration and cache avalanche

dtm-labs/rockscache also implements anti-cache penetration and cache avalanche.

Cache penetration means that data that is neither in the cache nor in the database is requested in large numbers. Since the data does not exist, the cache will not have the data, and all requests will directly penetrate to the database. In rockscache, you can set EmptyExipire to set the cache time for empty results. If it is set to 0, then the empty data will not be cached, and the anti-cache penetration will be turned off.

Cache avalanche means that there is a large amount of data in the cache. At the same point in time or within a short period of time, all of them have expired. At this time, if there is no data in the cache, the database will be requested, and the pressure on the database will suddenly increase. , it will crash if you can't bear it. rockscache can be set to RandomExpireAdjustment and add a random value to the expiration time to avoid simultaneous expiration.

Can the application achieve strong consistency?

Various scenarios of cache coherence and related solutions have been introduced above. Is it possible to ensure that the cache is used while providing strongly consistent data read and write? Strongly consistent read and write requirements are less than the previous eventually consistent demand scenarios, but in the financial field, there are also many scenarios.

When we discuss strong consistency here, we need to clarify the meaning of consistency first.

The most intuitive strong consistency for developers is likely to mean that the database and the cache are completely consistent. During the process of writing data and after it is written, whether it is directly read from the database or directly read from the cache, the latest written results can be obtained. . For this kind of "strong consistency" between two independent systems, it can be said very clearly that it is theoretically impossible, because updating the database and updating the cache on different machines cannot be updated at the same time, anyway. There is a time interval, and in this time interval, it must be inconsistent.

However, strong consistency at the application layer can be achieved. We can simply consider the familiar scenarios: the CPU cache is used as the memory cache, and the memory is used as the disk cache. These are all cache scenarios, and there has never been a consistency problem. Why? In fact, it is very simple. All data users are required to be able to read data only from the cache, but not from the cache and the underlying storage at the same time.

For DB and Redis, if all data reads can only be provided by the cache, it is easy to achieve strong consistency without inconsistency. Let's analyze the design according to the characteristics of DB and Redis:

First update cache or DB

Similar to CPU cache and memory, memory cache and disk, these two systems modify the cache first, and then modify the underlying storage, so in the current DB cache scenario, do you also modify the cache and then modify the DB?

In most application scenarios, developers consider Redis as a cache. When Redis fails, the application needs to support downgrade processing, and still be able to access the database and provide certain service capabilities. Considering this scenario, once a downgrade occurs, there is a problem with the solution of writing to the cache first and then to the DB. On the one hand, data will be lost, and on the other hand, the new version v2 in the cache will be read first, and then the old version v1 will be read. Therefore, in the scenario where Redis is used as a cache, most systems will adopt the design of writing to the DB first, and then writing to the cache.

Write DB success cache failure

What if the writing to the DB succeeds because of the process crash, but the first failure to mark the delayed deletion fails? Although after a few seconds, the retry will be successful, but during these few seconds, the user reads the cache, and the data of the old version is still. For example, if the user initiates a recharge, the funds have entered the DB, but the cache update fails, so the balance seen from the cache is still the old value. The handling of this situation is very simple. When the user recharges and the writing to the DB is successful, the application does not return success to the user, but waits for the cache update to be successful, and then returns the success to the user; when the user queries the recharge transaction, it is necessary to query the DB and Whether the cache is successful (you can query whether the global transaction of the two-phase message is successful), and only return success when both are successful.

Under the above processing strategy, after the user initiates the recharge, before the cache update is completed, the user sees that the transaction is still being processed, and the result is unknown. At this time, the strong consistency requirement is met; when the user sees the transaction It has been processed successfully, that is, the cache has been updated successfully, then all the data obtained from the cache is the updated data, so it also meets the requirements of strong consistency.

dtm-labs/rockscache also implements strongly consistent read requirements. When the StrongConsistency option is turned on, then the Fetch function in rockscache provides strongly consistent cache reads. The principle is not much different from delayed deletion. Only a small change has been made, that is, the old version of data is no longer returned, but the latest result of "fetching data" is synchronously waited for

Of course, this change will bring performance degradation. Compared with the final consistent data read, the strong consistent read on the one hand has to wait for the latest result of the current "fetch data", which increases the return delay, on the other hand, it has to wait for other processes As a result, sleep waits will occur, which consumes resources.

Strong consistency in cache demotion upgrade

In the above strong consistency scheme, it is stated that the premise of strong consistency is: "All data read can only be cached". However, if Redis fails and needs to be downgraded, the downgrade process may be as short as a few seconds, but if inaccessibility cannot be accepted within a few seconds, and access is strictly required, there will be read cache and read Taking the case of DB mixed use does not satisfy this premise. However, because the frequency of Redis failures is not high, applications requiring strong consistency are usually equipped with proprietary Redis, so the probability of encountering failure degradation is very low, and many applications will not put forward harsh requirements in this place.

However, dtm-labs, as a leader in the field of data consistency, has also studied this problem in depth and provided a solution under such harsh conditions.

Upgrading and downgrading process

Now let's consider the downgrade and downgrade processing of applications that have problems with the Redis cache. In general, the switch for upgrading and downgrading is in the configuration center. When the configuration is modified, each application process will receive notification of downgrading configuration changes one after another, and then downgrade in behavior. In the process of downgrading, there will be mixed access of cache and DB, and then our above scheme may be inconsistent. So how to deal with it to ensure that in the case of such mixed access, the application can still obtain strongly consistent results?

In the process of mixed access, we can adopt the following strategy to ensure data consistency during mixed access of DB and cache.

When updating data, use distributed transactions to ensure that the following operations are atomic operations
- Mark cache as "locking"
- Update DB
- Remove the cache "locking" flag and mark it for deferred deletion
When reading the cached data, for the data marked as "locking", read it again after sleeping; for the delayed deletion data, the old data is not returned, and the new data is completed before returning.
When reading DB data, read directly without any additional operations

This strategy is not much different from the previous strong consistency scheme that does not consider the downgrade scenario. The read data part is completely unchanged, and the need to change is to update the data. rockscache assumes that updating the DB is an operation that may fail in business, so a SAGA transaction is used to ensure atomic operations. For details, see the example dtm-cases/cache

There are sequential requirements for the opening and closing of upgrade and upgrade. It is not possible to enable cache read and write at the same time, but it is necessary to ensure that all write operations will update the cache when cache read is enabled.

The detailed process of downgrade is as follows:

Initial state:
- Reading: Mixed Reading
- Write: DB + cache
Read downgrade:
- Read: Turn off cached read. Mixed read => full DB read
- Write: DB + cache
Write downgrade:
- read: all DB read;
- Write: Turn off cache write. DB+cache=>write-only DB

The upgrade process is reversed, as follows:

Initial state:
- Read: Read all DB
- write: all write-only DB
Write upgrade:
- Read: Read all DB
- Write: Turn on write caching. write only DB => write DB + cache
Read the upgrade:
- Read: Partial read cache. Read all DB => mixed read
- Write: write DB + cache

dtm-labs/rockscache has implemented the above strongly consistent cache management method.

Interested students can refer to dtm-cases/cache , which contains detailed examples

summary

This article is very long, and many of the analysis are relatively obscure. Finally, I will summarize the usage of Redis cache:

The simplest way is: short cache time, allow a small number of database modifications, and delete the cache without synchronization
The way to ensure eventual consistency and prevent cache breakdown is: two-stage message + delayed deletion (rockscache)
Strong consistency: two-stage message + strong consistency (rockscache)
The most stringent consistency requirements are: two-stage message + strong consistency (rockscache) + upgrade and upgrade compatibility

For the latter two methods, we recommend using dtm-labs/rockscache as your caching solution

Welcome to dtm-labs/rockscache and dtm-labs/dtm , and star support us