1

Previously

There are many problems and knowledge points involved in the caching system. I will mainly discuss it with you in the following aspects:

  • stability
  • Correctness
  • Observability
  • Standardization and tool construction

, We analyzed the stability of the cache system and introduced how go-zero solves the problems of cache penetration, cache breakdown, and cache avalanche. It is relatively simple and easy to understand, and has relatively strong practical significance. It is recommended to read.

This article is the second in a series of articles, mainly discussing "cache data consistency" with you

Cache correctness

As mentioned in the previous article, our original intention of introducing caching was to reduce DB pressure and increase system stability, so we initially focused on the stability of the caching system. When the stability is solved, we generally face data correctness problems, and we may often encounter "Obviously the data is updated, why is it still showing the old one?" 』This kind of problem. This is the problem of "cache data consistency" that we often talk about. Next, we will carefully analyze its causes and countermeasures.

Common data update practices

First of all, the premise for us to talk about data consistency is that our DB update and cache deletion will not be treated as an atomic operation, because in a high-concurrency scenario, it is impossible to introduce a distributed lock to bind the two As an atomic operation, if bound, it will greatly affect the concurrency performance and increase the system complexity. Therefore, we will only pursue the final consistency of the data, and this article is only for high concurrency scenarios that do not pursue strong consistency requirements. , Financial payment and other students make their own judgments.

There are two types of common data update methods, and the rest are basically variants of these two types:

  • delete the cache first, and then update the database

This approach is to encounter data updates, we first delete the cache, and then update the DB, as shown on the left. Let's take a look at the flow of the entire operation:

  • A request needs to update the data, delete the corresponding cache first, and the DB has not been updated
  • B request to read data
  • B requests to see that there is no cache, read the DB and write the old data to the cache (dirty data)
  • A request to update DB

It can be seen that B requested to write dirty data into the cache. If this is data that reads more and writes less, the dirty data may exist for a long time (either there will be subsequent updates or wait for the cache to expire), which is not business. Accepted.

  • update the database first, and then delete the cache

In the right part of the above figure, you can see that the B request will read the old data between the update of the DB and the deletion of the cache by A, because the operation of A has not been completed at this time, and the time to read the old data is very short. It can meet the final data consistency requirements.

As you can see in the figure above, we are using to delete the cache instead of updating the cache. The reason is as follows:

In the above figure, I used operations instead of delete or update. When we do the delete operation, it does not matter whether A deletes first or B deletes first, because subsequent read requests will load the latest data from the DB; but when we update the cache During operation, it will be sensitive to whether A updates the cache first or B updates the cache first. If A is updated later, there will be dirty data in the cache again, so go-zero only uses the method of deleting the cache.

Let's take a look at the complete request processing flow:

Note: Different colors represent different requests.

  • Request 1 update DB
  • Request 2 queries the same data, and the old data is returned. It is acceptable to return the old data in a short time and meet the final consistency.
  • Request 1 delete cache
  • Request 3 is not in the cache when the request comes again, it will query the database, write back to the cache and return the result
  • Subsequent requests will read the cache directly

Another question is left for everyone to think about, how should we deal with the scene in the figure below?

If you have a good solution or want to know how to solve it, welcome to communicate in the WeChat group of the go-zero community. It is better to teach people how to fish than to teach people how to fish. The process of solving will definitely let you gain more~~

To be continued

This article discusses the problem of cache data consistency with you. In the next article, I will discuss with you the monitoring of the cache system and how to make the cache code more standardized and less buggy.

The solutions to all these problems have been included in the go-zero microservice framework. If you want to better understand the go-zero project, please go to the official website to learn specific examples.

Video playback address

ArchSummit Architect Summit-Cache Architecture Design under Massive Concurrency

project address

https://github.com/tal-tech/go-zero

Welcome to use go-zero and star support us!

WeChat Exchange Group

Follow the " practice " public enter the group obtain the QR code of the community group.

For the go-zero series of articles, please refer to the official account of "Microservice Practice"

kevinwan
939 声望3.5k 粉丝

go-zero作者