Two days ago, the service of a cloud vendor we were using was suspended, and it was suspended for a long time. Our service was strongly dependent on them, so we also hung up together. Yet there is nothing we can do but wait for them to recover. In the cause of the accident, they mentioned that Redis has a hot key. It happened that I was in charge of a departmental Redis cluster in the previous company, and I also dealt with many problems of Redis data hotspots. Next, let’s talk about what are Redis hotspots? Why does the Redis hotspot issue greatly affect the performance of the entire cluster? How to avoid Redis data hotspots? How to troubleshoot hot issues? How to solve hot issues?
What is a Redis hotspot?
I have mentioned the word locality many times in my past blog posts (for locality, please see the locality principle in my previous blog post), data hotspots are the embodiment of data access locality, and the specific performance is a Key in Redis The frequency of visits is much higher than that of the other remaining Keys . We also have a saying to describe this phenomenon.
Why does Redis have hot issues? This has to start with the principle of Redis. As we all know, KV data is stored in Redis. In cluster mode, Redis will allocate all data to 16384 data slots according to the CRC64 value of Key, and allocate these 16384 data slots to each machine in the cluster. , as much as possible to achieve uniform storage of data on each machine. However, uniform storage does not mean uniform access. Sometimes the request for a certain key will account for a large part of the total request, which will cause the request to be concentrated on a certain Redis instance and exhaust the carrying capacity of the Redis instance. All other data stored on this instance is also inaccessible, which means that all services that depend on this data will fail.
The problem here does not mean that Redis will crash directly. As we all know, the core process of Redis is a single-threaded mode, which means that Redis processes all requests serially. When there are too many requests, the requests will be congested. From the application layer From the point of view, the time-consuming of requesting Redis will be particularly high. Because the application layer uses Redis as a cache and is a synchronous request, it will indirectly cause the request processing of the application layer to be particularly time-consuming, which will cause the application layer requests to gradually become congested and eventually become unavailable as a whole.
Let’s take a simple example. I believe that everyone has eaten melons on Weibo. When a big melon appears, a group of users will quickly flood into Weibo to retrieve relevant information and frantically access the same Weibo (data), In this case, this data is hot data. If the data is too "hot", Weibo will eventually hang up. In fact, Weibo has been hung up many times, not because Weibo technology is not good, but because of hot spots. The problem is terrible.
How does Redis hotspot bring down other services?
Hotspots in Redis will not only cause a single service exception, but all services that depend on this Redis cluster. In the above figure, Server2, Server, and Server3 frequently access XXX_KEY, which makes the RedisServer2 instance unavailable, because Server4 depends on Key7 on RedisServer2. Even if RedisServer1 3 4 5 is in normal service, Server4 cannot provide normal services to the outside world.
Will someone ask RedisServer2 to hang up, can't it be dropped and put on a new machine? In fact, in Redis cluster mode, if an instance goes down, the Redis cluster will automatically replace it. But now the situation is that even if a new instance is replaced, a large number of requests will suddenly flood in and overwhelm it. Therefore, in the face of the Redis hotspot problem, means such as restarting are invalid, and the problem can only be solved from the requesting side.
In fact, this is an obvious barrel model . The amount of water a barrel can hold depends on the shortest board on the barrel. When the application uses Redis cluster, the performance upper limit of Redis cluster is not exactly equal to the upper limit of single instance. Multiply by the number of instances. However, when there is a problem with any instance in the Redis cluster, the upper layer perceives that there is a problem with the entire Redis cluster.
How to avoid Redis hotspots?
As mentioned above, hotspot problems are actually local problems, and it is very difficult to avoid local problems. Almost any distributed system will be affected by locality. Faced with this kind of problem, to be honest, there is no absolute way to avoid it. We can only take appropriate measures by analyzing the characteristics of the data in advance. To put it bluntly, it depends on experience. I don’t know if you have any other good ideas. You can discuss it in the comment area.
How to troubleshoot hot issues?
The problem of hot spots in Redis is actually very easy to check. It relies on monitoring data to monitor the CPU usage and QPS data of each Redis instance. If you see that the load and QPS of some instances in the redis cluster are particularly high, but the load of other instances is very high. Low, don't ask, there must be a hot issue, the next thing you need to do is to find out the specific hot key and find out the source of data access.
It is actually very simple to find out the hot key. It is easy to find out some of the access logs and statistics. But it is more difficult to find the source of data access. Like my previous company, the same Redis cluster is shared by many businesses, but Redis access is not included in the data of full-link monitoring, so find The most direct way to find out the source of the visit is to ask in the group, which sounds primitive, but there is no other way.
How to solve hot issues?
Although the best way to solve the problem is to avoid the occurrence of the problem, as I said just now, it is difficult to avoid Redis hot spots . There must be hot spots in any business, but it does not necessarily cause disasters . The discovery of hot spots does not necessarily have to be caused by accidents. We can also conduct regular inspections in our daily work. If there are signs of hot spots, we can directly kill them.
As for how to solve the hot spot after discovering it, here are two of my solutions, you can discuss it:
Application Layer Cache
The common implementation method is to implement local cache (LocalCache) in the application, which is equivalent to adding a layer of Cache to the Redis data. For those very hot hot data, the application layer has a great probability to find the data in the local cache. , only a very small part of the requests when the LocalCache data expires will leak to the bottom, so that the requests for hot data can be digested inside the application layer, thus greatly reducing the pressure on Redis.
In this implementation method, only the data reading end needs to be modified, and the data writing end does not need to be modified at all. However, the disadvantages are also obvious:
- Each end needs to implement it by itself, which will increase the development and maintenance costs of the application layer.
- It will waste additional storage space at each end.
- It needs targeted development and is not suitable for large-scale promotion.
increase data copy
Since the hotspot problem is caused by a large number of accesses to a key, we can just split the request for this key. For example, the original hot key is called XXX_KEY . When writing data, we can use different keys to write 10 copies repeatedly, such as XXX_KEY_01, XXX_KEY_02...XXX_KEY_10. When accessing, we randomly select one of 1-10 from the original key. The suffix between them can be used, so that data requests can be dispersed. If you want to make requests more dispersed, you can store more copies.
The advantage of this scheme is that the implementation cost of the data read end is low (not completely absent), but the requirements for the data write end are much higher. Not only must multiple copies be written, but also the issue of consistency after data writing must be considered. This method requires both ends to be changed, which seems to be more troublesome. Do you think it is not as good as the first solution? In fact, it is not. Generally speaking, there will be many and scattered reading ends, the cost of transformation will be very high, and frequent changes are unlikely, so some work has to be placed on a relatively centralized end.
The above two schemes actually exchange performance through storage, and the main difference lies in who does it. The former is done by the client, and the latter is done by the server, each with its own advantages and disadvantages. Is it possible to provide a pure Redis protocol to the outside world, but it can solve the problem of data hotspots? . Lu Xun... David Wheeler once said that all problems in computer science can be solved by adding a layer , and hot issues are no exception. We can add an intermediate layer between the application and Redis. This intermediate layer can be a real service, such as a unified data access layer. It can also be a specially crafted Redis client.
The middle layer can add a local Cache to a specific key to ensure that hot spots will not appear on Redis. As for which keys to add to the local cache, the middle layer can analyze the hot data of recent requests in real time and decide by itself. In fact, the easiest way is to open an LRU or LFU Cache.
In addition, like the second scheme of increasing data copies, it can also be implemented by the middle layer. When we find a data hotspot, let the middle layer actively copy the hotspot data, intercept and rewrite all requests for the hotspot data, and disperse it. Of course, if the middle layer is more intelligent, all of these can be automated, from hot spot discovery to resolution, without human involvement.
That's all for today's article. If you find it useful, please like it, and if you like it, please follow it. Regarding the hot issues of Redis, if you have any opinions or experiences, you can leave a message in the comment area to discuss.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。