1

I heard that 10 people interviewed at Internet companies, and 9 people will be asked about cache avalanche and cache penetration.

I heard that among these 9 people, at least 8 people answered incompletely.

And among these 8 people, they all dealt with various interview materials found on the Internet, and did not really understand it.

Of course, it is also normal, only the architecture of large-scale application cache will focus on these two problems.

So how to really understand the underlying logic of these two questions, let's look at the answers of ordinary people and experts.

Ordinary people:

Ok.................

Expert:

A cache avalanche means that a large amount of data stored in the cache expires at the same time.

Most of the traffic that was originally resisted by the cache component was all requested to the database.

A phenomenon that causes the database server to crash due to increased pressure on the database.

image-20220328150446221

I think there are two main reasons for cache avalanche:

  1. The downtime of the cache middleware can of course be avoided by making a high-availability cluster for the cache middleware.
  2. Most of the keys in the cache are set with the same expiration time, resulting in the expiration of these keys at the same time. For such cases, add a random value of 1 to 5 minutes to the failure time.

The cache penetration problem means that a large number of non-existing keys are requested to the application in a short period of time, and these non-existing keys cannot be found in the cache, so all of them penetrate into the database, causing database pressure.

I think the core problem of this scenario is an attack on the cache, because in normal business, even if such a situation occurs, due to the continuous warm-up of the cache, the impact will not be great.

The attack behavior needs to be persistent over time, and this goal can only be achieved if the key does not exist in the database. Therefore, I think there are two ways to solve it:

  1. Also save the invalid key in Redis, and set a special value, such as "null", so that if you visit again next time, you will not check the database.
  2. However, if the attacker continues to use random non-existing keys to access, there will still be problems, so it can be implemented with a Bloom filter. When the system starts, all the target data is cached in the Bloom filter. When the user makes a request with a non-existing key, first go to the Bloom filter to query, if it does not exist, it means that the key does not exist in the database.

    Another advantage of the Bloom filter is that it uses bitmap for data storage and occupies very little memory space.

image-20220328160151160

However, in my opinion, the question you raised is a little too exaggerated.

First of all, in a mature system, for the more important hot data, there must be a special cache system to maintain, and the maintenance of its expiration time must be different from other business keys. And for very important scenarios, we will also design a multi-level cache system.

Second, even if a cache avalanche is triggered, the disaster recovery capability of the database itself is not so fragile. The strategies of master-slave, dual-master, and read-write separation of the database can alleviate concurrent traffic well.

Finally, the database itself also has a limit on the maximum number of connections. Requests that exceed the limit will be rejected. Combined with the circuit breaker mechanism, the database system can also be well protected. At most, it will cause a bad experience for some users.

In addition, in terms of program design, in order to avoid the problem of a large number of requests penetrating the database due to cache misses, you can also lock the link in accessing the database. Although performance is affected, it is safe for the system.

image-20220328154257800

All in all, I think there are many solutions, and the specific choice depends on the specific business scenario.

The above is my understanding of the problem.

Summarize

I found that many interviews now are really for the sake of interviews, either by picking questions on the Internet, or by constantly asking irrelevant questions.

As for how the interviewer judges whether you are suitable or not, we don't know. It is estimated that some friends said, look at your looks and look at your eyes!

I think a qualified interviewer must have a very deep technical foundation.

This issue of the interview series of ordinary people vs masters is over here. Friends who like it remember to like and favorite.

In addition, if you have any technical questions or professional development-related questions, you can send me a private message, and I will reply as soon as possible.

file

Copyright notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated. Please indicate the source for Mic带你学架构 !
If this article is helpful to you, please help to follow and like, your persistence is the driving force for my continuous creation. Welcome to follow the WeChat public account of the same name to get more technical dry goods!

跟着Mic学架构
810 声望1.1k 粉丝

《Spring Cloud Alibaba 微服务原理与实战》、《Java并发编程深度理解及实战》作者。 咕泡教育联合创始人,12年开发架构经验,对分布式微服务、高并发领域有非常丰富的实战经验。