Author: Vivo Internet Server Team - Tang Wenjian
1. Background
Students who have used Redis should know that it is an in-memory database based on key-value pairs. All data is stored in memory, and memory plays a core role in Redis, and all operations are carried out around it.
We are often asked the following questions in the actual maintenance process, such as how can data be stored in Redis to save costs and improve performance? What Causes Redis Memory Alarms?
This article mainly analyzes the Redis memory structure, introduces memory optimization methods, and combines production cases to help you optimize memory usage and quickly locate Redis-related memory exceptions.
2. Redis memory management
This chapter introduces in detail how Redis manages each memory structure, and then mainly introduces several memory structures that may occupy a lot of memory. First, let's look at the memory model of Redis.
The memory model is shown in the figure:
[used_memory] : The most important part of Redis memory usage, the total amount of memory (in KB) allocated by the Redis allocator (the compiler is specified at compile time, the default is jemalloc), mainly including its own memory (dictionary, metadata), Object memory, cache, lua memory.
[Self-memory] : Some data dictionaries and metadata maintained by itself generally occupy very low memory.
[Object memory] : All objects are of Key-Value type, Key objects are strings, and Value objects include 5 types (String, List, Hash, Set, Zset), and 5.0 also supports stream type.
[Cache] : Client buffer (normal + master-slave replication + pubsub) and aof buffer.
[Lua memory] : It mainly stores the loaded Lua scripts, and the memory usage is related to the number of loaded Lua scripts.
[used\_memory\_rss] : Redis main process occupies the memory of the operating system (unit is KB), which is the value obtained from the perspective of the operating system, such as top, ps and other commands.
[Memory Fragmentation] : If the data is changed frequently, the space released by redis may not be released in the physical memory, but redis cannot be used effectively, which forms memory fragmentation.
[Running memory] : The memory consumed during operation, generally occupies a low memory, within 10M.
[Sub-process memory] : Mainly when persisting, the memory consumed by the sub-process generated by aof rewrite or rdb is generally relatively small.
2.1 Object memory
The object memory stores all key-value data types of Redis. The key objects are all of the string type. The value objects mainly have five data types: String, List, Hash, Set, and Zset. Different types of objects are encapsulated by corresponding codes. Externally defined as a RedisObject structure, RedisObject is stored by a dictionary (Dict), and the bottom layer of the dictionary is implemented through a hash table. The key-value pairs in the dictionary are saved through the nodes in the hash table, and the structure is as follows:
(Source: Book "Redis Design and Implementation")
In order to greatly improve the flexibility and efficiency of Redis, Redis sets different encodings for an object according to different usage scenarios, thereby optimizing the efficiency in a certain scenario.
The rules for selecting encoding for various objects are as follows:
string (string)
- [int]: (integer and number length less than 20, directly recorded in ptr*)
- [embstr]: (Continuously allocated memory (string length less than or equal to 44 bytes))
- [raw]: dynamic string (a string greater than 44 bytes, and the character length is less than 512M (512M is the size limit of the string))
list (list)
- [ziplist]: (the number of elements is less than the hash-max-ziplist-entries configuration (default 512), and all values are less than the hash-max-ziplist-value configuration (default 64 bytes))
- [linkedlist]: (When the list type cannot meet the conditions of ziplist, Redis will use linkedlist as the internal implementation of the list)
- [quicklist]: (Redis 3.2 version introduces quicklist as the underlying implementation of list, and no longer uses linkedlist and ziplist implementation)
set (collection)
- [intset]: (The elements are all integers and the number of elements is less than the set-max-intset-entries configuration (default 512))
- [hashtable]: (hashtable is used when the collection type cannot meet the conditions of intset)
hash (list of hashes)
- [ziplist]: (the number of elements is less than the hash-max-ziplist-entries configuration (default 512), and the length of any value is less than the hash-max-ziplist-value configuration (default 64 bytes))
- [hashtable]: (hashtable is used when the hash type cannot meet the conditions of intset
zset (sorted set)
- [ziplist]: (the number of elements is less than the zset-max-ziplist-entries configuration (default 128) and the value of each element is less than the zset-max-ziplist-value configuration (default 64 bytes))
- [skiplist]: (When the ziplist condition is not met, the sorted set will use skiplist as the internal implementation)
2.2 Buffer memory
2.2 1 Client cache
Client-side buffering refers to the input and output buffering of all TCP connections connected to the Redis service. There are common client buffers, master-slave replication buffers, and subscription buffers, all of which are controlled by the corresponding parameter buffers (the input buffer has no parameter control, and the maximum space is 1G). If the set maximum value is reached, the client will be disconnected.
[client-output-buffer-limit]: Limit the size of the client output buffer, followed by the client type (normal, slave, pubsub) and the limit size, the default is 0, no limit, if there is a limit, after the threshold is reached , which will disconnect the link and free up memory.
[repl-backlog-size]: The default is 1M. The backlog is a master-slave replication buffer and a ring buffer. Assuming that the set threshold is reached and there is no overflow problem, it will be covered cyclically, such as synchronizing data during slave interruption. If it is not overwritten, it is fine to perform an incremental sync. The larger the backlog setting is, the longer the slave can lose connection. It is limited by the parameter maxmemory. Normally, it should not be set too large.
2.2 2 AOF buffer
When we turn on AOF, first store the command from the client in the AOF buffer, and then write to the AOF file on the disk according to the specific strategy (always, everysec, no), and record the flush time at the same time .
There is no limit to the AOF buffer, and there is no need to limit it, because each time the main thread performs AOF, it will compare the successful time of the last flush; if it exceeds 2s, the main thread will block until the fsync synchronization is completed. When the main thread is blocked, aof\ The _delayed\_fsync state variable record is incremented. Therefore, the AOF cache will only store data for a few seconds, and the memory consumption is relatively small.
2.3 Memory Fragmentation
Memory fragmentation in programs is a very common problem. The default allocator of Redis is jemalloc, and its strategy is to divide the memory space according to a series of fixed sizes, such as 8 bytes, 16 bytes, 32 bytes, …, 4KB, 8KB Wait. When the memory requested by the program is closest to a fixed value, jemalloc will allocate a fixed-size space that is larger than it, so some fragmentation will occur. In addition, when data is deleted, the freed memory will not be returned to the operation immediately. system, but redis itself cannot be used effectively, and it forms fragments.
Memory fragmentation will not be counted in used\_memory. The memory fragmentation ratio records a dynamic value mem\_fragmentation\_ratio in redis info, which is the ratio of used\_memory\_rss / used\_memory, mem\_fragmentation\_ratio The closer it is to 1, the lower the fragmentation rate. The normal value is within 1~1.5, which means that there are many fragments.
2.4 Child process memory
As mentioned above, the child process is mainly to generate the child process generated by RDB and AOF rewrite, and it will also occupy a certain amount of memory. However, in this process, when the write operation is infrequent, the memory consumption is less, and the write operation is very frequent. more.
3. Redis memory optimization
The objects of memory optimization are mainly object memory, client buffering, memory fragmentation, sub-process memory, etc. Because these memory consumption is relatively large or sometimes unstable, the direction of our memory optimization is divided into: reducing memory Use, improve performance, reduce memory exceptions.
3.1 Object memory optimization
The optimization of object memory can reduce memory usage and improve performance. The optimization point is mainly optimized for the selection of different encodings for different objects.
Before optimization, we can understand the following knowledge points :
(1) First of all, there are three kinds of encodings of string type . Int encoding does not need to allocate memory except for its own object, and the pointer of object does not need to point to other memory spaces. It is optimal in terms of performance and memory usage. Embstr will allocate a block. Contiguous memory space, but if there is any change in this value, the value object will become raw encoding, and it is irreversible.
(2) When ziplist stores list, each element will be used as an entry; when storing hash, key and value will be used as two adjacent entries; when storing zset, member and score will be used as two adjacent entries, when the above conditions are not met , the ziplist will be upgraded to linkedlist, hashtable or skiplist encoding.
(3) Encodings with large memory are not degraded to ziplists under any circumstances.
(4) Linkedlist and hashtable are easy to add, delete and modify, but take up a lot of memory.
(5) ziplist occupies less memory, but because each modification may trigger realloc and memcopy, it may lead to chain updates (data may need to be moved). Therefore, the efficiency of the modification operation is low, and this problem is more prominent when there are many entries in the ziplist.
(6) Since most of the current versions of redis are running above 3.2, the encoding of the List type is all quicklist, which is a doubly linked list composed of ziplists, each node of which is a ziplist, taking into account the comprehensive balance space Fragmentation and read and write performance are two dimensions, so a new code quicklist is used. Quicklist has a more important parameter list-max-ziplist-size. When it takes a positive number, a positive number means to limit the entries in the ziplist of each node. Quantity, if it is a negative number, it can only be -1~-5, which limits the size of the ziplist. The limits from -1~-5 are 4kb, 8kb, 16kb, 32kb, 64kb, and the default is -2, that is, the limit does not exceed 8kb .
(7) [rehash] : Many of the bottom layers of redis storage are hashtables. The client can find the corresponding object according to the hash value calculated by the key, but when the amount of data becomes larger and larger, there may be hash values calculated by multiple keys. At this time, these same hash values will be stored in the form of a linked list. If the linked list is too large, the performance will be degraded during traversal, so Redis defines a threshold (load factor loader_factor = key-value pair in the hash table) number/hash table length), will trigger a progressive rehash, the process is to create a new larger hashtable, and then gradually move the data to the new hashtable.
(8) [bigkey] : bigkey generally means that the value of value occupies a large amount of memory space, but there is no fixed standard for this size. We can call it bigkey if we define more than 10M.
Optimization suggestion:
- The key is controlled within 44 bytes as much as possible, and embstr encoding is used. Compared with raw encoding, embstr reduces one memory allocation. At the same time, because it is stored in continuous memory, the performance will be better.
- Multiple string types can be combined into a small hash type for maintenance. The small hash type can be used for ziplist, which has a good compression effect and saves memory.
- Try not to have too many elements in the value object of a non-string type to avoid generating large keys.
- When the elements of value are many and change frequently, do not use ziplist encoding, because ziplist is a continuous memory allocation, which is not friendly to frequently updated objects, and the performance loss is large.
- The hash type object should not contain too many elements to avoid consuming too much memory during rehash.
- Try not to modify the parameter value of the ziplist limit, because although the ziplist encoding can compress the memory well, if there are too many elements to use the ziplist, the performance may be degraded.
3.2 Client Buffer Optimization
The client cache is the culprit of many abnormal memory growth, most of which are caused by the abnormal growth of the normal client output buffer. Let's first understand the process of executing commands. The client sends one or sends a set of request commands to the server through piplie. , and then wait for the response from the server. Generally, the client uses the blocking mode to wait for the response from the server. Before the data is read by the client, the data is stored in the client cache. The simple flowchart of command execution is as follows:
The abnormal growth may be caused by the following:
- Client access to large keys causes the client output cache to grow abnormally.
- The client uses the monitor command to access Redis. The monitor command will continuously store all commands that access redis in the output buffer, resulting in an abnormal growth of the output buffer.
- In order to speed up the access efficiency, the client uses pipline to encapsulate a large number of commands, resulting in an abnormally large returned result set (the feature of pipline is to wait for all commands to be executed before returning, and it is temporarily stored in the output buffer area before returning).
- The application of data from the slave node is slow, resulting in a large backlog of data in the output master-slave replication output cache, which finally causes the buffer to grow abnormally.
Abnormal performance :
- In the result returned by the info command of Redis, the value of client\_recent\_max\_output\_buffer in the client part is very large.
- In the result set returned by executing the client list command, omem is not 0 and is very large. omem represents the output of the client and represents the number of bytes used by the cache.
- In a cluster, a small number of used_memory may show abnormal growth in monitoring, because both monitor and pipeline are commands issued for a single instance.
Optimization suggestion :
- The application should not design big keys, and the big keys should be split as much as possible.
- The common client output buffer area of the server is set by parameters. Because most of the memory alarm thresholds start from 80% of the usage rate, the actual recommended parameters can be set to about 5% to 15% of the instance memory, preferably not more than 20%, to avoid OOM.
- Avoid using the monitor command or the rename command unless it is special.
- When using pipline, the pipeline cannot encapsulate too many commands, especially some commands that return more result sets should be less encapsulated.
- Reference for master-slave replication output buffer size setting: Buffer size = (master library write command speed * operation size - network transmission command speed between master and slave libraries * operation size) * 2.
3.3 Fragmentation optimization
Fragmentation optimization can reduce memory usage and improve access efficiency. In versions below 4.0, we can only use restart recovery, restart loading rdb, or restart data reloading through high-availability master-slave switching to reduce fragmentation. In versions above 4.0, Redis Provides automatic and manual defragmentation functions. The principle is roughly to copy the data to the new memory space, and then release the old space, which has a certain performance loss.
[a. Redis manual defragmentation] : Execute the memory purge command.
[b.redis automatic defragmentation] : Controlled by the following parameters
- [activedefrag yes]: Enable automatic defrag cleanup switch
- [active-defrag-ignore-bytes 100mb]: How much memory fragmentation space can be used to start defragmentation
- [active-defrag-threshold-lower 10]: How many percent does the fragmentation rate reach before starting defragmentation
- [active-defrag-threshold-upper 100 ]: How much the memory fragmentation rate exceeds, then try to defragment as much as possible (occupy the most resources for defragmentation)
- [active-defrag-cycle-min 25 ]: The minimum percentage of resources occupied by automatic memory sorting
- [active-defrag-cycle-max 75]: The maximum percentage of resources occupied by automatic memory sorting
3.4 Subprocess memory optimization
As mentioned earlier, the AOF rewrite and RDB generation actions will generate sub-processes. Normally, during the execution of the two actions, the sub-processes that are forked will not consume a lot of memory when Redis write operations are not so frequent. This is mainly because The Redis subprocess uses the Linux copy on write mechanism, or COW for short.
The core of COW is to share the memory space with the parent process after forking the child process. Only when the parent process writes and modifies the memory data will it actually allocate the memory space and copy the memory data.
But there is one thing to pay attention to, do not turn on the large page THP (Transparent Huge Pages) of the operating system. After turning on the THP mechanism, the original page size has changed from 4KB to 2MB. Although it can speed up the completion of fork (because the number of pages to be copied is reduced), it will cause the unit of copy-on-write copy memory pages to increase from 4KB to 2MB. If the parent process has a large number of write commands, it will increase the memory. amount of copies, resulting in excessive memory consumption.
Fourth, the memory optimization case
4.1 Buffer exception optimization case
A memory alarm occurred in the online business Redis cluster, and the memory usage rate quickly increased to 100%. The staff on duty first carried out emergency expansion, and at the same time reported to the business group whether a large amount of new data was written. The business feedback did not have a large amount of new data written. At the same time, the memory after the expansion is still rising, and an alarm will be triggered soon. The business DBA will check the monitoring to see the specific reason.
First of all, we can see that the increase in used_memory is only a few instances of the cluster, and the number of keys of instances with abnormal memory does not increase abnormally, indicating that large batches of data are not written.
Let's further analyze, it may be that the client's memory usage is abnormally large. Check the client-related indicators in the instance info, and observe that the growth curve of output\_list is consistent with the used\_memory, which can be determined to be caused by the abnormal output buffer of the client.
Next, let's go through the client list to see which client causes the output to grow, what command the client is executing, and analyze whether to access the big key.
Execute client list |grep -i omem=0 and find the following:
id=12593807 addr=192.168.101.1:52086 fd=10767 name= age=15301 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=16173 oll= 341101 omem=5259227504 events=rw cmd=get
Explain the meaning of several key fields related to the following:
[id]: It is the unique identifier of the client, which is often used in our kill client to use the id;
[addr]: client information;
[obl]: Fixed buffer size (bytes), the default is 16K;
[oll]: Dynamic buffer size (number of objects), the client will write the dynamic buffer if the response result of each command exceeds 16k or the fixed buffer is full;
【omem】: refers to the total number of bytes in the buffer;
【cmd】: The last operation command.
It can be seen that the buffer memory occupies a lot, and the most recent operation command is also get, so let's first see if it is caused by a big key (we directly analyzed the RDB and found that there is no big key), but found that there is no big key, and the corresponding get It must be a string type. The maximum value of a string type is 512M, so a single key is unlikely to generate such a large cache, so it is concluded that the client has cached multiple keys.
At this time, in order to recover as soon as possible, communicate with the business to temporarily kill the connection, release the memory, and then communicate with the business side to set the normal client cache limit in order to prevent future exceptions. Because the maximum memory is 25G, we set the cache to 2G- 4G, the dynamic setting parameters are as follows:
config set client-output-buffer-limit normal 4096mb 2048mb 120
Because the parameter limit is only so large for the output buffer of a single client, it is also necessary to check that the client uses a pipeline command such as pipline or a similar implementation of encapsulating a large batch of commands, causing the results to be blocked before the unified return. , the business layer needs to be gradually optimized, otherwise we limit the output buffer, and the session will be killed when the upper limit is reached, so if the business does not change, there will still be an error thrown.
The feedback from the business side is the Redis client that comes with the C++ language brpc. For the first time, I directly searched for the keyword without pipline, but the phenomenon pointed to the pipeline used, so I continued to look at the code carefully and found that pipline was implemented internally. A similar function also encapsulates multiple commands to request redis, and then returns the results uniformly. The client GitHub link is as follows:
https://github.com/apache/incubator-brpc/blob/master/docs/cn/redis_client.md
Summary :
pipline is used a lot in Redis clients, because it can indeed provide access efficiency, but improper use will affect access. Access should be controlled, and these memory limits should be added to the production environment as much as possible to avoid abnormal access of some clients from affecting global use. .
4.2 Case of abnormal growth of slave node memory
The online Redis cluster has a disaster alarm with a memory usage rate exceeding 95%, but the cluster has 190 nodes and only 3 nodes trigger an abnormal memory alarm. So check the corresponding information of the cluster and the monitoring indicators to find the following useful information:
- The memory of the master node corresponding to the three slave nodes does not change, and the memory of the slave node increases gradually.
- It is found that the overall ops of the cluster is relatively low, indicating that the business has not changed much, and no sudden increase in effective commands has been found.
- The maximum memory of the master and slave nodes is inconsistent. The master node is 6G and the slave node is 5G. This is an important reason for the disaster alarm.
- Before the problem, the master node had about 1.3G more memory than the slave node, and then the slave node used_memory gradually increased to exceed the master node's memory, but the rss memory remained the same in the end.
- The time period when the master-slave replication is delayed and the memory grows.
Process:
The first thought should be to keep the maximum memory of the master and slave nodes consistent, but because the host memory usage rate is relatively high, it is temporarily impossible to expand the capacity, because the thought is that the slave node may be blocked for some reason, so the communication with the business party is to restart the next 2 slave nodes to alleviate the problem , after the restart, it will be released from the node memory, and it will be reduced to the level before the problem occurs.
One week after the memory adjustment, the memory of the three slave nodes is alarmed again. Because the master and slave memory are now the same, a serious alarm (>85%) is triggered. Check the monitoring and find that the situation is the same as before. Guess this is Some operations are triggered, so I decided to ask the business side what operations are in these two time periods. The business feedback is that the business is being written during this time, and the two time periods are all being written, and I have also read the writing of redis. In that code, a relatively rare command append is used, which appends the value of string type.
Here we have to mention how the string type allocates memory in Redis: the string type is all sds storage. When the currently allocated sds memory space is insufficient for storage and less than 1M, Redis will reallocate a memory that is twice the previous memory size. space.
According to the above knowledge points, the above series of problems can be roughly analyzed. Probably when the append operation was performed at that time, the slave node needed to allocate space to cause memory expansion, while the master node did not need to allocate space, because the memory reallocation design malloc and free operation, so it is normal to have lag at that time.
The master-slave of Redis itself is a logical replication. The process of loading RDB is actually getting kv and writing to the slave node continuously, so the size of the master-slave and the memory is often different, especially the size of the values is often changed. In scenarios, the space used by the master-slave kv storage may be different.
In order to prove this guess, we can obtain the size of the space occupied by the master and slave nodes by obtaining a key (value size is larger), because it is a version above 4.0, so we can use memory USAGE to obtain the size and see how much the difference is. We randomly found a few slightly larger keys to check, and found that in some keys the slave library occupies nearly twice the space of the main library, some are almost the same, and some are more than double, and the key space parsed by rdb is smaller. , indicating that loading rdb for storage is the smallest after the slave node restarts, and then due to a large number of key operations for a certain period of time, the space allocated for the large batch of keys from the slave node is insufficient, and the space needs to be expanded by 1 times, resulting in an increase in memory.
At this point, the analysis is almost the same. Because of the characteristics of append, in order to avoid the memory alarm from occurring again, it was decided to expand the memory of the cluster and control the memory usage below 70% (to avoid the possibility that a large number of keys use memory to double. Case).
Finally, there is a question: why is the used\_memory above larger than the value of memory\_rss? (swap is off).
This is because jemalloc memory allocation actually allocates virtual memory at the beginning, and memory is only allocated when data is written to the allocated page. memory\_rss is the actual memory usage, and used\_memory is actually a counter. In Redis When doing malloc/free of memory, add and subtract this used_memory.
Regarding the question that used\_memory is greater than memory\_rss, the author of redis also answered:
https://github.com/redis/redis/issues/946#issuecomment-13599772
Summarize:
In the case of knowing the principle of Redis memory allocation, the analysis of the abnormal memory problem of the database can be quickly located. In addition, a certain problem may not seem to be related to the business, but we should still communicate with the business side to obtain some clues to troubleshoot the problem, and finally The master-slave memory must be consistent according to the specification.
V. Summary
Redis has done a very clever design and optimization in data storage and caching. After we understand its internal structure and storage method, we can optimize the key design in advance. When we encounter memory exceptions or performance optimization, we can no longer be limited to surface analysis such as resource consumption, command complexity, key size, and can also be combined with some internal operating mechanisms and memory management methods of Redis. Drill down to see if there are other areas that might be causing the anomaly or performance degradation.
References
- Book "Redis Design and Implementation"
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。