Fault Analysis | Redis Cluster shard memory abnormal usage and uneven problem diagnosis - 个人文章

Author: Ren Zhongyu
A member of the DBA team of Aikesheng, good at failure analysis and performance optimization, welcome to discuss technical issues related to articles.
Source of this article: original contribution
*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.

background

The background of the problem is a production Redis cluster (version Redis 5.0.10, the architecture is more than 30 slices), and the memory usage of a certain slice in the cluster is abnormally high (the memory usage is over 70%, and the memory usage of other slices is relatively low). , we simulate the production environment as shown in the following monitoring diagram:

I believe that everyone who read the title of the article already knows the conclusion of the problem. What I want to share with you here is the method of troubleshooting this problem.

diagnosis

Memory usage distribution monitoring

Looking at the memory usage distribution, it is found that the memory usage of the abnormal sharding instance of Redis is about 356M, and the maximum available memory of a single redis is 512M
Other normal sharded redis memory usage is less than 100M

Abnormal and normal instance memory usage comparison

The amount of data (info keyspace) of observed abnormal instances is relatively small
However, the abnormal instance data object occupies twice the memory of other normal instances.

 ### 正常实例
redis-cli -p 6380 -h 10.186.62.28 info keyspace ##数据量
# Keyspace
db0:keys=637147,expires=0,avg_ttl=0
redis-cli -p 6380 -h 10.186.62.28 info memory |grep -w used_memory ##内存使用
used_memory:104917416

### 异常实例
redis-cli -p 6382 -h 10.186.62.5 info keyspace ##数据量
# Keyspace
db0:keys=191433,expires=0,avg_ttl=0
redis-cli -p 6382 -h 10.186.62.56 info memory |grep -w used_memory ## 内存使用
used_memory:373672656

Fragmentation rate usage

The memory fragmentation of the abnormal instance is normal, and excessive fragmentation is excluded.

 redis-cli -p 6382 -h 10.186.62.56 info memory |grep mem_fragmentation_ratio
mem_fragmentation_ratio:0.89  ## 碎片率小于1

Bigkeys scan analysis

The previous analysis failed, try to analyze and scan through bigkeys (to avoid affecting business operations, it is recommended to perform low-peak business operations)
The scan results are as follows (intercept key parts)

 # redis-cli -p 6382 -h 10.186.62.56  --bigkeys

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest string found so far '"key:{06S}:000061157249"' with 3 bytes
[00.03%] Biggest string found so far '"key3691"' with 4 bytes
[40.93%] Biggest string found so far '"bigkkkkk:0"' with 102400000 bytes
[51.33%] Biggest string found so far '"bigk:0"' with 204800000 bytes

-------- summary -------

Sampled 191433 keys in the keyspace!
Total key length in bytes is 4161149 (avg len 21.74)

Biggest string found '"bigk:0"' has 204800000 bytes

0 lists with 0 items (00.00% of keys, avg size 0.00)
0 hashs with 0 fields (00.00% of keys, avg size 0.00)
191433 strings with 307777256 bytes (100.00% of keys, avg size 1607.75)
0 streams with 0 entries (00.00% of keys, avg size 0.00)
0 sets with 0 members (00.00% of keys, avg size 0.00)
0 zsets with 0 members (00.00% of keys, avg size 0.00)

The results are as follows
- There are 2 abnormal keys
  - For example, the largest key is "bigk:0", occupying about 200M
  - The big key "bigkkkkk:0" occupies about 100M
- Other string keys are relatively small, occupying less than 10 bytes on average, such as 'key:xx', etc.
Note that this article is relatively simple because the test environment simulated by itself is relatively simple, and the actual production environment is a little more complicated. There may be different types of keys such as hash and set types. These keys cannot get the memory footprint after passing the --bigkeys analysis tool, but only You can know the number of elements/members, so you need to get the memory footprint through other commands:
- Perform memory analysis on abnormal keys, and the results are as follows. The two abnormal keys occupy about 300M of space, which is consistent with the phenomenon of abnormally high memory usage in monitoring.

 10.186.62.56:6382> memory usage bigkkkkk:0
    (integer) 117440568
    10.186.62.56:6382>
    10.186.62.56:6382> memory usage bigk:0
    (integer) 234881072

in conclusion

Through the above analysis process, we know that when it is found that the memory distribution in the Redis Cluster cluster is uneven, analyzing bigkeys can be regarded as a fast and effective troubleshooting method, but we need to pay attention to executing it during low peak periods.
- redis-cli -p {port} -h {host} --bigkeys
BTW, if you need to simulate Redis's large keys, large amounts of data, or blocking, you can use some useful debug / mem commands

 # 制造 10 条以 renzy:id: 为前缀，大小为 1024 字节的 key
127.0.0.1:9999> debug populate 10 renzy:id: 1024
OK
127.0.0.1:9999> keys renzy:id*
 1) "renzy:id::8"
 2) "renzy:id::2"
 3) "renzy:id::4"
·····

## 制造阻塞
127.0.0.1:9999> debug sleep 2 //阻塞 2 秒
OK
或
# redis-cli -p 9999 keys \* > /dev/null 
//（如果数据量大的话直接 keys *即可）

Fault Analysis | Redis Cluster shard memory abnormal usage and uneven problem diagnosis

background

diagnosis

Memory usage distribution monitoring

Abnormal and normal instance memory usage comparison

Fragmentation rate usage

Bigkeys scan analysis

in conclusion

爱可生开源社区

引用和评论

如何巧妙解决 Too many connections 报错？

如何实现页面广告随时上下线、过期自动下线及到时自动上线

Redis 又双叒叕改开源协议了，微软提前推出高性能替代方案 Garnet

Redis与MySQL数据一致性问题解决方案

Linux Redis 安装、配置、教程（一）

缓存穿透、缓存击穿、缓存雪崩的区别与解决方案以及大Key与大Value问题解决方案

Redis是如何实现分布式锁的？使用中遇到过什么问题？如何解决的？红锁和set NX 有什么区别？