Introduction | Redis, as a high-performance in-memory key-value data structure storage system, is widely used in caches, counters, message queues, leaderboards and other scenarios in our daily development, especially as the most commonly used caching method. It has played an indelible role in improving data query efficiency and protecting databases. However, in practical applications, some Redis cache exceptions may occur. This article mainly summarizes Redis cache exceptions and processing solutions.
1. Background
Redis is a completely open source, BSD protocol compliant, high-performance key-value data structure storage system. It supports data persistence and can store in-memory data on disk, and not only supports simple key-value It also provides storage of data structures such as list, set, zset, and hash, which is very powerful. Redis also supports data backup, that is, data backup in master-slave mode, thereby improving availability. Of course, the most important thing is that the read and write speed is fast, which is widely used as the most commonly used cache solution in our usual development. However, in the actual application process, there will be abnormal conditions such as cache avalanche, cache breakdown and cache penetration. If these conditions are ignored, it may bring catastrophic consequences. The following mainly analyzes these cache exceptions and common processing solutions. with the summary.
2. Cache Avalanche
(1) What is
A large number of requests that should have been processed in the redis cache for a period of time were sent to the database for processing, resulting in a rapid increase in the pressure on the database. In severe cases, the database may even crash, resulting in the collapse of the entire system, just like an avalanche. Chain effect, so called cache avalanche.
(2) why
There are two common reasons for the above situation:
A large amount of cached data expires at the same time, causing data that should have been requested to the cache to be retrieved from the database again.
Redis itself fails and cannot process the request, so it will naturally request the database again.
(3) What to do
For the case where a large amount of cached data expires at the same time:
- When actually setting the expiration time, you should try to avoid scenarios where a large number of keys expire at the same time. If there is, set the expiration time through random, fine-tuning, and uniform settings to avoid expiration at the same time.
- Add a mutex so that the operations to build the cache don't happen at the same time.
- Dual-key strategy, the primary key is the original cache, and the backup key is the copy cache. When the primary key fails, the backup key can be accessed. The primary key cache expiration time is set to short-term, and the backup key is set to long-term.
- The background update cache strategy uses timed tasks or message queues to update or remove the redis cache.
For the case where redis itself fails:
- At the prevention level, a highly available cluster can be built by means of master-slave nodes, that is, after the master Redis instance hangs, other slave libraries can quickly switch to the master library and continue to provide services.
- If something has happened, in order to prevent the database from being crashed by a large number of requests, you can use the method of service fuse or request current limit. Of course, the service interruption is relatively rough, stop the service until the redis service is restored, and the request current limit is relatively mild, ensuring that some requests can be processed, not one-size-fits-all, but it still depends on the specific business situation to choose the appropriate processing plan.
3. Cache breakdown
(1) What is
Cache breakdown generally occurs in high-concurrency systems. A large number of concurrent users request data that is not in the cache but in the database at the same time, that is, the data is not read from the cache at the same time, and the data is retrieved from the database at the same time, causing instantaneous pressure on the database. increase. Different from the cache avalanche, the cache breakdown refers to checking the same piece of data. The cache avalanche means that different data have expired, and many data cannot be found, so the database is checked.
(2) why
In fact, the general reason for this situation is that a certain hot data cache has expired. Because it is hot data, the concurrent request volume is large, so when it expires, there will still be a large number of requests coming at the same time. .
(3) What to do
There are two common solutions to this situation:
- Simply and rudely, no expiration time is set for the hotspot data, so that it will not expire, and naturally the above situation will not occur. If you want to clean up later, you can clean it up in the background.
- Add a mutual exclusion lock, that is, when it expires, except for the first query request, the lock request can be obtained to the database and updated to the cache again. Others will be blocked until the lock is released, and the new The cache is also updated, and subsequent requests will be requested to the cache again, so that there will be no cache breakdown.
Fourth, cache penetration
(1) What is
Cache penetration means that the data is neither in redis nor in the database, so that every time a request comes, after the corresponding key is not found in the cache, the database has to be queried again every time, and it is found that the database is also No, it's equivalent to doing two useless queries. In this way, the request can bypass the cache and directly check the database. If someone wants to maliciously attack the system at this time, they can deliberately use null values or other non-existing values to make frequent requests, which will cause a lot of pressure on the database.
(2) why
The reason for this phenomenon is actually very easy to understand. If the user has not performed corresponding operations or processing on some information in the business logic, then the corresponding database or cache for storing the information will naturally have no corresponding data, and it is easy to appear. above problem.
(3) What to do
For cache penetration, there are generally the following three solutions:
- The restrictions on illegal requests mainly refer to parameter verification, authentication verification, etc., so that a large number of illegal requests are blocked from the beginning, which is a necessary means in actual business development.
- The cache is empty or the default value. If the data that cannot be retrieved from the cache is not retrieved from the database, then we still cache the empty result and set a short expiration time. The default value of this setting is stored in the cache, so that the second time you get the value from the cache, you will not continue to access the database, which can prevent a large number of malicious requests from repeatedly using the same key to attack.
- Use the Bloom filter to quickly determine whether data exists. So what is a Bloom filter? In short, it can introduce multiple independent hash functions to ensure that the element weighting is completed under a given space and false positive rate. Because we know that there is a situation of hash collision, if only one hash function is used, the probability of collision and collision will obviously increase. In order to reduce this kind of collision, we can introduce several more hash functions, and Bloom filter The core idea of the algorithm is to use multiple different hash functions to resolve such a conflict. Its advantages are high space efficiency, short query time, far more than other algorithms, and its disadvantage is that there will be a certain misrecognition rate, it cannot fully guarantee the requested key, through the verification of the Bloom filter, it must be With this data, after all, there will still be conflicts in theory, no matter how small the probability is. However, as long as it does not pass the verification of the Bloom filter, then the key must not exist. As long as this is used, most requests for non-existing keys can be filtered out, which is enough in normal scenarios.
5. Others
In addition to the above-mentioned three common Redis cache exception problems, the terms cache warm-up and cache demotion are often heard, which are not so much abnormal problems as two optimization methods.
(1) Cache warm-up
Cache warm-up is to load the relevant cache data directly into the cache system before and after the system goes online, without relying on the user. This can avoid the problem of querying the database first and then caching the data when the user requests. Users can directly query the cached data that has been preheated in advance, so as to avoid the high concurrent traffic in the early stage of the system going online, all accessing the database will cause traffic pressure on the database. Depending on the magnitude of the data, you can do the following:
- The amount of data is not large: it is automatically loaded when the project starts.
- Large amount of data: The cache is periodically refreshed in the background.
- Very large amount of data: preload cache operation only for hot data.
(2) Cache downgrade
Cache degradation means that when the cache fails or there is a problem with the cache service, in order to prevent the failure of the cache service, resulting in an avalanche problem with the database, the database is not accessed, but for some reasons, I still want to ensure that the service is still basically available. Although it would certainly be detrimental to service. Therefore, for unimportant cached data, we can adopt a service degradation strategy. There are two general approaches:
- Direct access to the data cache in the memory section.
- Return directly to the default value set by the system.
6. Summary
This article mainly summarizes common Redis cache exceptions and their handling solutions, which can be summarized in the following figure:
About the Author:
Yin Zhehao, Tencent Operations Development Engineer.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。