Author: Ren Zhongyu

A member of the DBA team of Aikesheng, good at failure analysis and performance optimization, welcome to discuss technical issues related to articles.

Source of this article: original contribution

*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.


background

The background of the problem is a production Redis cluster (version Redis 5.0.10, the architecture is more than 30 slices), and the memory usage of a certain slice in the cluster is abnormally high (the memory usage is over 70%, and the memory usage of other slices is relatively low). , we simulate the production environment as shown in the following monitoring diagram:

I believe that everyone who read the title of the article already knows the conclusion of the problem. What I want to share with you here is the method of troubleshooting this problem.

diagnosis

Memory usage distribution monitoring

  • Looking at the memory usage distribution, it is found that the memory usage of the abnormal sharding instance of Redis is about 356M, and the maximum available memory of a single redis is 512M
  • Other normal sharded redis memory usage is less than 100M

Abnormal and normal instance memory usage comparison

  • The amount of data (info keyspace) of observed abnormal instances is relatively small
  • However, the abnormal instance data object occupies twice the memory of other normal instances.
 ### 正常实例
redis-cli -p 6380 -h 10.186.62.28 info keyspace ##数据量
# Keyspace
db0:keys=637147,expires=0,avg_ttl=0
redis-cli -p 6380 -h 10.186.62.28 info memory |grep -w used_memory ##内存使用
used_memory:104917416

### 异常实例
redis-cli -p 6382 -h 10.186.62.5 info keyspace ##数据量
# Keyspace
db0:keys=191433,expires=0,avg_ttl=0
redis-cli -p 6382 -h 10.186.62.56 info memory |grep -w used_memory ## 内存使用
used_memory:373672656

Fragmentation rate usage

  • The memory fragmentation of the abnormal instance is normal, and excessive fragmentation is excluded.
 redis-cli -p 6382 -h 10.186.62.56 info memory |grep mem_fragmentation_ratio
mem_fragmentation_ratio:0.89  ## 碎片率小于1

Bigkeys scan analysis

  • The previous analysis failed, try to analyze and scan through bigkeys (to avoid affecting business operations, it is recommended to perform low-peak business operations)
  • The scan results are as follows (intercept key parts)
 # redis-cli -p 6382 -h 10.186.62.56  --bigkeys

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.1 to sleep 0.1 sec
# per 100 SCAN commands (not usually needed).

[00.00%] Biggest string found so far '"key:{06S}:000061157249"' with 3 bytes
[00.03%] Biggest string found so far '"key3691"' with 4 bytes
[40.93%] Biggest string found so far '"bigkkkkk:0"' with 102400000 bytes
[51.33%] Biggest string found so far '"bigk:0"' with 204800000 bytes

-------- summary -------

Sampled 191433 keys in the keyspace!
Total key length in bytes is 4161149 (avg len 21.74)

Biggest string found '"bigk:0"' has 204800000 bytes

0 lists with 0 items (00.00% of keys, avg size 0.00)
0 hashs with 0 fields (00.00% of keys, avg size 0.00)
191433 strings with 307777256 bytes (100.00% of keys, avg size 1607.75)
0 streams with 0 entries (00.00% of keys, avg size 0.00)
0 sets with 0 members (00.00% of keys, avg size 0.00)
0 zsets with 0 members (00.00% of keys, avg size 0.00)
  • The results are as follows

    • There are 2 abnormal keys

      • For example, the largest key is "bigk:0", occupying about 200M
      • The big key "bigkkkkk:0" occupies about 100M
    • Other string keys are relatively small, occupying less than 10 bytes on average, such as 'key:xx', etc.
  • Note that this article is relatively simple because the test environment simulated by itself is relatively simple, and the actual production environment is a little more complicated. There may be different types of keys such as hash and set types. These keys cannot get the memory footprint after passing the --bigkeys analysis tool, but only You can know the number of elements/members, so you need to get the memory footprint through other commands:

    • Perform memory analysis on abnormal keys, and the results are as follows. The two abnormal keys occupy about 300M of space, which is consistent with the phenomenon of abnormally high memory usage in monitoring.
 10.186.62.56:6382> memory usage bigkkkkk:0
    (integer) 117440568
    10.186.62.56:6382>
    10.186.62.56:6382> memory usage bigk:0
    (integer) 234881072

in conclusion

  • Through the above analysis process, we know that when it is found that the memory distribution in the Redis Cluster cluster is uneven, analyzing bigkeys can be regarded as a fast and effective troubleshooting method, but we need to pay attention to executing it during low peak periods.

    • redis-cli -p {port} -h {host} --bigkeys
  • BTW, if you need to simulate Redis's large keys, large amounts of data, or blocking, you can use some useful debug / mem commands
 # 制造 10 条以 renzy:id: 为前缀,大小为 1024 字节的 key
127.0.0.1:9999> debug populate 10 renzy:id: 1024
OK
127.0.0.1:9999> keys renzy:id*
 1) "renzy:id::8"
 2) "renzy:id::2"
 3) "renzy:id::4"
·····

## 制造阻塞
127.0.0.1:9999> debug sleep 2 //阻塞 2 秒
OK
或
# redis-cli -p 9999 keys \* > /dev/null 
//(如果数据量大的话直接 keys *即可)

爱可生开源社区
426 声望211 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。