2

Previously

The first two articles ( cache stability and cache correctness ) discussed with you the "stability" and "correctness" of the cache, the common problems of the cache are "observability" and "standard implementation & tools" Construction

  • stability
  • correctness
  • Observability
  • Standardization and tool construction

After the article was posted last week, many students had in-depth discussions on the issues I left. I believe that after in-depth thinking, you will have a deeper understanding of cache consistency!

First of all, there are a lot of discussions in various Go groups and go-zero groups, but everyone has not found a very satisfactory answer.

Let us analyze several possible solutions to this problem together:

  • Use distributed locks to make each update an atomic operation. This method is the most undesirable. It is equivalent to abolishing martial arts, giving up high concurrency and pursuing strong consistency. Don’t forget that I emphasized in the previous article "This series of articles is only for high-concurrency scenarios that do not pursue strong consistency requirements. , Financial payment and other students judge by themselves", so we give up this solution first.
  • Add A to delete the cache plus delay, for example, perform this operation after 1 second. This disadvantage is to solve this extremely low probability situation, and all updates can only get old data within 1 second. This method is not ideal, nor do we want to use it.
  • A to delete the cache here to set a special placeholder, and let B set the cache use the setnx instruction of redis, and then request the cache again when the subsequent request encounters this special placeholder. This method is equivalent to adding a new state when deleting the cache, let’s look at the situation in the figure below

    Did it come back again, because the A request must forcibly set the cache or determine whether the content is a placeholder when it encounters a placeholder. So this will not solve the problem.

Let's see how go-zero responds to this situation. We chose not to deal with this situation. Isn't it surprising? So let's go back to the original point to analyze how this situation happened:

  • There is no cache for the data of the read request (it is not loaded into the cache at all or the cache is invalid), triggering the DB read
  • At this time there is an update operation to the data
  • This sequence needs to be met: B requests to read DB -> A requests to write DB -> A requests to delete the cache -> B requests to set up the cache

We all know that DB write operations need to lock row records, which is a slow operation, and read operations do not need, so the probability of such a situation is relatively low. Moreover, we have set the expiration time. The probability of encountering such situations in real scenes is extremely low. To truly solve this type of problem, we need to ensure consistency through 2PC or Paxos protocol. I think this is not the method that everyone wants to use, too It's complicated!

I think the most difficult thing to do is to know how to trade-off. Finding the balance of the best profit is a test of comprehensive ability. Of course, if you have any good ideas, you can contact me through the group or official account, thank you!

This article is the third article in the series, mainly discussing "Cache Monitoring and Code Automation" with you

Cache observability

In the previous two articles, we solved the problem of cache stability and data consistency. At this time, our system has fully enjoyed the value of cache and solved the problem from zero to one. Then we have to consider how Further reduce the cost of use, determine which caches bring actual business value, which can be removed, thereby reducing server costs, which caches I need to increase server resources, qps is the 060ad9d9d37ced of each cache, what is the hit rate, and is there any need for further tuning? Wait.

The above figure is a cache monitoring log of a service. It can be seen that there are 5,057 requests per minute for this cache service, 99.7% of which hit the cache, and only 13 of them fell into the DB, and the DB successfully returned. From this monitoring, it can be seen that the caching service has reduced the DB pressure by three orders of magnitude (90% hits are one order of magnitude, 99% hits are two orders of magnitude, and 99.7% is almost three orders of magnitude). It can be seen that the benefit of this cache is It's quite possible.

But if conversely, if the cache hit rate is only 0.3%, there is no benefit. Then we should remove this cache. One is to reduce system complexity (if not necessary, don’t add entities), and the other is to reduce server costs. .

qps this service is particularly high (enough to cause greater pressure on the DB), then if the cache hit rate is only 50%, that means we have reduced the pressure by half, we should consider increasing the expiration time according to the business situation to increase the cache hit rate.

qps this service is particularly high (enough to cause greater pressure on the cache) and the cache hit rate is also high, then we can consider increasing the qps cache can carry or adding in-process cache to reduce the pressure of the cache.

All of these are based on cache monitoring. Only when observable can we make further targeted tuning and simplification, I have always emphasized that "no measurement, no optimization".

How to make the cache be used in a standardized way?

Students who understand the go-zero design ideas or have watched my shared videos may have an impression of the "tools than conventions and documents" that I often talk about.

For caching, there are many knowledge points. The cache code written by everyone will definitely have a different style, and it is very difficult to write all the knowledge points correctly, just like me, a veteran who has written programs for so many years. It is still very difficult for me to write all the knowledge points correctly at one time. So how does go-zero solve this problem?

  • As much as possible, encapsulate the abstract general solution into the framework. In this way, the entire cache control flow does not require everyone to worry about, as long as you call the correct method, there is no possibility of error.
  • The codes from table building sql to CRUD + Cache are all generated with one click of the tool. It prevents everyone from writing a bunch of structure and control logic based on the table structure.

This is a CRUD + Cache generation instruction cut from the official example bookstore We can provide the schema required by goctl through the specified table-building sql file or datasource, and then the goctl of model can generate the required CRUD + Cache code with one click.

This ensures that the cache code written by everyone is the same, can the tool generation be different? :P

To be continued

This article discusses the observability and code automation of caching with you. In the next article, I will share with you how we refine and abstract the general solution of caching. You can learn about the design of the cluster index in advance. Think about how to do caching. After in-depth thinking, your understanding will be more profound!

The solutions to all these problems have been included in the go-zero microservice framework. If you want to better understand the go-zero project, please go to the official website to learn specific examples.

Video playback address

ArchSummit Architects Summit-Cache Architecture Design under Massive Concurrency

project address

https://github.com/tal-tech/go-zero

Welcome to use go-zero and star support us!

WeChat Exchange Group

Follow the " Practice " public communication group get the QR code of the community group.

For the go-zero series of articles, please refer to the official account of "Microservice Practice"

kevinwan
931 声望3.5k 粉丝

go-zero作者