How to solve Redis cache breakdown (invalidation), cache penetration, and cache avalanche?

Raw data is stored in DB (such as MySQL, Hbase, etc.), but DB has low read and write performance and high latency.

For example, MySQL's TPS = 5000 and QPS = about 10,000 on a 4-core 8G, and the average reading and writing time is 10~100 ms.

Using Redis as a caching system can just make up for the lack of DB. "Code Brother" performed the Redis performance test on his MacBook Pro 2019 as follows:

$ redis-benchmark -t set,get -n 100000 -q
SET: 107758.62 requests per second, p50=0.239 msec
GET: 108813.92 requests per second, p50=0.239 msec

TPS and QPS reached 100,000 , so we introduced a cache architecture to store the original data in the database, and at the same time store one copy in the cache.

When the request comes in, the data will be removed from the cache first, and if there is any, the data in the cache will be returned directly.

If there is no data in the cache, go to the database to read the data and write it to the cache, and then return the result.

Is this seamless? Improper design of the cache will lead to serious consequences. This article will introduce three common problems and solutions in the use of caches:

cache breakdown (failure);
cache penetration;
Cache Avalanche.

Cache breakdown (failure)

High concurrent traffic, the data accessed is hot data, the requested data exists in the DB, but the copy stored in Redis has expired, and the backend needs to load the data from the DB and write it to Redis.

Keywords: single hotspot data, high concurrency, data failure

However, due to high concurrency, the DB may be overwhelmed, making the service unavailable. As shown below:

缓存击穿

solution

Expiration time + random value

For hot data, we do not set the expiration time, so that all requests can be processed in the cache, and the high throughput performance of Redis can be fully utilized.

Or add a random value to the expiration time.

When designing the expiration time of the cache, use the formula: expiration time = baes time + random time.

That is, when the same business data is written to the cache, a random expiration time is added on top of the basic expiration time, so that the data will slowly expire in the future, so as to avoid all of the expired instantaneously, which will cause excessive pressure on the DB.

warm up

Store the popular data in Redis in advance, and set the expiration time of the popular data to a very large value.

use lock

When a cache invalidation is found, data is not loaded from the database immediately.

Instead, the distributed lock is acquired first, and the database query and write data to the cache operation are performed after the lock is successfully acquired. If the lock acquisition fails, it means that there is currently a thread performing the database query operation, and the current thread sleeps for a period of time before retrying.

This allows only one request to go to the database to read data.

The pseudo code is as follows:

public Object getData(String id) {
    String desc = redis.get(id);
        // 缓存为空，过期了
        if (desc == null) {
            // 互斥锁，只有一个请求可以成功
            if (redis(lockName)) {
                try 
                    // 从数据库取出数据
                    desc = getFromDB(id);
                    // 写到 Redis
                    redis.set(id, desc, 60 * 60 * 24);
                } catch (Exception ex) {
                    LogHelper.error(ex);
                } finally {
                    // 确保最后删除，释放锁
                    redis.del(lockName);
                    return desc;
                }
            } else {
                // 否则睡眠200ms，接着获取锁
                Thread.sleep(200);
                return getData(id);
            }
        }
}

cache penetration

Cache penetration: It means that there is a special request to query a non-existing data, neither the data exists in Redis nor in the database.

As a result, penetrates to the database for each request, and the cache becomes a decoration, which puts a lot of pressure on the database and affects normal services.

as the picture shows:

缓存穿透

solution

Cache empty value: When the requested data does not exist in Redis nor in the database, set a default value (for example: None). When the query is performed again later, the null value or the default value is directly returned.
Bloom filter: Synchronize the ID to the Bloom filter when the data is written to the database. When the requested id does not exist in the Bloom filter, it means that the data queried by the request must not be saved in the database. Do not go to the database query.

BloomFilter needs to cache the full amount of keys, which requires a small number of full keys, and it is best to have less than 1 billion pieces of data, because 1 billion pieces of data will occupy about 1.2GB of memory.

Let's talk about the principle of Bloom filter.

The algorithm of BloomFilter is to first allocate a memory space as a bit array, and the initial values of the bit bits of the array are all set to 0.

When adding elements, use k independent Hash functions to calculate, and then set all K positions of the element Hash map to 1.

Detect whether the key exists, and still use the k Hash functions to calculate k positions. If the positions are all 1, it indicates that the key exists, otherwise it does not exist.

As shown below:

布隆过滤器

hash function will collide, so the Bloom filter will have false positives.

The false positive rate here refers to the probability that BloomFilter judges that a key exists, but it does not actually exist, because it stores the Hash value of the key, not the value of the key.

So there is a probability that there are such keys, their contents are different, but the hash values after multiple hashes are the same.

For the key that BloomFilter judges does not exist, it is 100% non-existent. Contradictory method, if this key exists, the corresponding Hash value position after each Hash must be 1, not 0. The existence of a Bloom filter does not necessarily exist.

Cache Avalanche

Cache avalanche refers to the that a large number of requests cannot be processed in the Redis cache system, and all requests are sent to the database, resulting in a of database pressure and even downtime.

There are two main reasons for this:

A large amount of hot data expires at the same time, resulting in a large number of requests that need to query the database and write to the cache;
Redis is down, and the cache system is abnormal.

Cache a large amount of data and expire at the same time

The data is stored in the cache system and the expiration time is set, but because at the same moment, a large amount of data expires at the same time.

The system sends all requests to the database to obtain data. If the concurrency is large, the pressure on the database will surge.

cache avalanche occurs when a large amount of data fails at the same time, while cache breakdown (invalidation) occurs when a hotspot data fails. This is their biggest difference.

As shown below:

缓存雪崩-大量缓存同时失效

solution

Add random value to expiration time

To avoid setting the same expiration time for a large amount of data, expiration time = baes time + random time (smaller random number, such as random increase of 1~5 minutes).

In this way, all hot data at the same time will not be invalidated, and the difference in expiration time will not be too large, which not only ensures the invalidation of similar time, but also meets the business needs.

interface current limit

When the access is not core data, add interface current limiting protection to the query method. For example, set 10000 req/s.

If the core data interface is accessed, the cache does not exist to allow queries from the database and set to the cache.

This way, only part of the request will be sent to the database, reducing the pressure.

Current limiting means that we control the number of requests entering the system per second at the front-end of the request entry of the business system to avoid too many requests being sent to the database.

As shown below:

缓存雪崩-限流

Redis downtime

A Redis instance can support 100,000 QPS, while a database instance has only 1,000 QPS.

Once Redis goes down, it will cause a large number of requests to hit the database, resulting in a cache avalanche.

solution

There are two solutions to cache avalanches caused by cache system failures:

Service fuse and interface current limiting;
Build a high-availability cache cluster system.

Service Fusing and Current

In the business system, uses service fuse for high concurrency to damage the provision of services to ensure the availability of the system.

service fuse means that when an abnormality is found in the data obtained from the cache, the error data is directly returned to the front end, preventing all traffic from hitting the database and causing downtime.

Service fuse and current limit are solutions to how to reduce the impact of avalanches on the database when a cache avalanche occurs.

Build a highly available cache cluster

Therefore, the cache system must build a Redis high-availability cluster, such as "Redis Sentinel Cluster" or "Redis Cluster Cluster" , if the master node of Redis fails, the slave node can also be switched to become the master node. Continue to provide cache services to avoid cache avalanches caused by cache instance downtime.

Summarize

cache penetration refers to the fact that the database does not have this data, the request goes straight to the database, and the cache system is useless.
Cache breakdown (invalidation) means that the database has data, the cache should have data, but the cache expires, the Redis traffic protection barrier is broken down, and the request goes straight to the database.
cache avalanche refers to a large number of hotspot data that cannot be processed in the Redis cache (large area hotspot data cache invalidation, Redis downtime), all traffic hits the database, causing great pressure on the database.

References

https://segmentfault.com/a/1190000039688578

https://cloud.tencent.com/developer/article/1824584

https://learn.lianglianglee.com/

https://time.geekbang.org/

How to solve Redis cache breakdown (invalidation), cache penetration, and cache avalanche?

Cache breakdown (failure)

solution

Expiration time + random value

warm up

use lock

cache penetration

solution

Cache Avalanche

Cache a large amount of data and expire at the same time

solution

Redis downtime

solution

Summarize

码哥字节

引用和评论

从 12s 到 200ms，MySQL 两千万数据 6 种深度分页优化

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性