Huawei Cloud Database GaussDB (for Cassandra) Revealed the second phase: the troubleshooting experience of abnormal memory growth

Abstract: Huawei Cloud Database GaussDB (for Cassandra) is a cloud-native NoSQL database based on a separate computing and storage architecture and compatible with the Cassandra ecosystem; it relies on a shared storage pool to achieve strong consistency and ensure data security and reliability.

This article is shared from the Huawei Cloud Community " Huawei Cloud Database GaussDB (for Cassandra) Revealing the Second Issue: Experience of Abnormal Memory Growth 160dd67fa0d4c2", the original author: Cassandra official.

Background introduction

Huawei Cloud Database GaussDB (for Cassandra) is a cloud-native NoSQL database based on a separation of computing and storage architecture and compatible with the Cassandra ecosystem; it relies on a shared storage pool to achieve strong consistency and ensure data security and reliability. The core features are: separation of storage and calculation, low cost, and high performance.

Problem Description

GaussDB (for Cassandra) has encountered some challenging problems under the self-developed architecture, such as high CPU, memory leaks, abnormal memory growth, and high latency. These are also typical problems encountered in the development process. Analyzing the abnormal growth of memory is a relatively big challenge. The abnormal growth of memory is a fatal problem for the program, because it may trigger OOM, abnormal process downtime, business interruption and other results, so reasonable planning and use of memory should be carried out. Control is particularly important. By adjusting the cache capacity, bloom filter size, and memtable size, etc., the performance improvement, read and write latency improvement and other effects can be achieved.

During the offline test, it was found that after the kernel was running for a long time, the memory only increased, and abnormal growth occurred. It is suspected that there may be a memory leak.

Analysis & verification

First, according to the memory usage, the memory is divided into two parts, namely the inside and outside the heap, and the two pieces of memory are analyzed separately. Determine that the memory in question is off-heap memory, and further analyze off-heap memory. Introduce a more efficient memory management tool tcmalloc to solve the problem of abnormal memory growth. The following is the specific analysis and verification process.

Determine the memory abnormal area

Use jdk's jmap command and Cassandra's monitoring (configure jvm.memory.* monitoring items) and other methods to collect jvm heap memory and process overall memory every 1 minute.

Start the test case until the overall memory of the kernel reaches the upper limit. Analyzing the collected variation curves of in-heap memory and process memory, it was found that the in-heap memory remained relatively stable, and there was no continuous rise. However, the overall memory of the kernel continued to rise during the period, and the growth curves of the two did not match. That is, the problem should occur in off-heap memory.

Off-heap memory analysis and verification

glibc memory management

Use the pmap command to print the memory address space distribution of the process, and found that there are a large number of 64MB memory blocks and many memory fragments. This phenomenon is related to the memory allocation method of glibc. The use of off-heap memory is similar to the overall memory growth trend of the process, and it is initially suspected that the problem is caused by off-heap memory. In addition, glibc has harsh conditions for returning the memory, that is, the memory is not easy to release in time, the memory is fragmented, and the guessing problem is related to gblic. When there is too much memory fragmentation and the waste of free memory is serious, the maximum usage of the process memory will eventually exceed the maximum expected plan, and even OOM may occur.

tcmalloc memory management

Introduce the tcmalloc memory manager to replace the ptmalloc memory management method of glibc. To reduce excessive memory fragments and improve memory efficiency, this analysis and verification uses gperftools-2.7 source code to compile tcmalloc. Running the same test case, it is found that the memory is still rising, but the increase is lower than before. The distribution of the memory address is printed out through pmap, and it is found that the previous small memory blocks and memory fragments are significantly reduced, indicating that the tool has a certain optimization effect , Which confirms the aforementioned speculation of excessive memory fragmentation.

But the problem of abnormal memory growth still exists, a bit like tcmalloc is not recovered in time or is not recovered. In fact, tcmalloc's memory recovery is more "reluctant", mainly for direct use when memory applications are needed again, reducing the number of system calls and improving performance. For this reason, manually call its release memory interface releasefreememory. It is found that the effect is not obvious, and the reason is temporarily unknown (there may indeed be free memory that has not yet been released).

Manually trigger the releasefreememory interface of tcmalloc

In order to verify this problem, it is done by setting the cache capacity.

Set the cache capacity to 6GB first, and then press the read request to fill up the 6GB capacity of the cache
Modify the cache capacity to 2GB. In order to quickly release the memory, manually call the releasefreememory interface of tcmalloc, and found no effect. It is speculated that the reason why the memory still keeps rising and not falling after tcmalloc is used may be related to the interface.
Record logs in multiple places inside the releasefreememory interface, and then start the process to test again, and found that an error was reported when the system call madvise failed.

Code location:

Error log information:

The call fails through that place, and the code is analyzed. It is found that the memory release logic of tcmalloc is "round-robin", that is, if a span fails to be released in the middle, the subsequent span to be released is terminated, and the releasefreememory logic call ends. This is consistent with the previous phenomenon. After the releasefreememory interface is executed, there is basically no effect. It is found that every time tens of MB are released, the release logic is terminated because of the failure of the interface call.
Analyze the reason for the failure of the system call madvise again. By patching this method of the kernel, it is found that the reason for the failure is that the memory state corresponding to the incoming address block is the LOCKED state. Causes the system call to fail, and an error is reported as an illegal parameter.
The memory is in the LOCKED state, and related to this state, there are codes to call the mlock system method and the system's ulimit configuration. Analysis of the relevant code did not find any abnormalities. Query the system ulimit configuration and find that max locked memory is unlimited. Modify its configuration to 16MB, restart the Cassandra process, test again, and find that the memory release effect is significant.
Continue to run the test and found that the continued increase in memory disappeared. When the business continues to exist, the memory will rise to the highest level and will not rise any more. It will remain stable and meet the planned memory usage. After the business pressure was reduced or even stopped, the memory showed a slow downward trend.

Solution & Summary

Introduce the tcmalloc tool to optimize memory management. The more excellent memory managers include Google's tcmalloc and Facebook's jemalloc, etc.
Modify the max locked memory parameter configuration of the system.

A reasonable allocation process needs to use the maximum value of the memory and reserve a certain capacity. Further analysis is needed for the memory that does not meet the expected growth. Memory-related issues are strongly related to programs. The key configuration of the system needs to be cautious, and its impact must be evaluated. All similar configurations were checked at the same time.

Increase the releasefreememory command, the back-end calls, and optimize the tcmalloc hold memory not release problem. However, the execution of the releasefreememory command will lock the entire pageHeap, which may cause the memory allocation request to be hanged, so it needs to be executed carefully.

The backend adds a dynamically configurable tcmalloc_release_rate parameter to adjust the frequency of tcmalloc returning memory to the operating system. The reasonable range of the value is [0-10], 0 means never return, the larger the value, the higher the frequency of return, the default value is 1.

Concluding remarks

This article analyzes the memory growth problems encountered in the development process, uses better memory management tools, and more fine-grained memory monitoring, and more intuitively monitors the memory status during database operation to ensure the stable and high-performance operation of the database.

Click to follow, and get to know the fresh technology of

Huawei Cloud Database GaussDB (for Cassandra) Revealed the second phase: the troubleshooting experience of abnormal memory growth

Background introduction

Problem Description

Analysis & verification

Determine the memory abnormal area

Off-heap memory analysis and verification

glibc memory management

tcmalloc memory management

Manually trigger the releasefreememory interface of tcmalloc

Solution & Summary

Concluding remarks

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

百万级群聊的设计实践

分布式数据库解析

嘎嘎好用！推荐三款开源的 Redis 桌面客户端！

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

在 Kubernetes 上用 KubeBlocks + Dify 快速构建生产级 AIGC 应用

《SQL应用场景解析：如何通过SQL解决实际业务问题》