How to troubleshoot Rust&#39;s high memory usage in a production environment

📄

Text｜Wei Xikai (Technical Expert of Ant Group)

This article is 6320 words read in 10 minutes

▼

The memory-safe Rust, although there is basically no memory leak, how to allocate memory reasonably is a problem that every complex application must face. Often with different businesses, the same code may produce different memory usage. Therefore, there is a high probability that the memory will be used too much, and the memory will gradually grow and not be released.

In this article, I want to share about the problem of excessive memory usage that we encountered in the course of practice. For these memory problems, we will make a simple classification in this article, and provide us with a method of troubleshooting and positioning in a production environment for your reference.

This article was first published in Rust Magazine Chinese Monthly
(https://rustmagazine.github.io/rust_magazine_2021/chapter_5/rust-memory-troubleshootting.html）

Memory allocator

First of all, in the production environment, we often do not choose the default memory allocator (malloc), but choose jemalloc, which can provide better multi-core performance and better avoid memory fragmentation (for detailed reasons, please refer to [1]). In the Rust ecosystem, there are many excellent libraries for jemalloc packaging. Here we are not entangled in which library is better. We are more concerned about how to use the analysis capabilities provided by jemalloc to help us diagnose memory problems.

Reading the usage documentation of jemalloc, you can know that it provides memory profile capabilities based on sampling, and you can set the two options prof.active and prof.dump through mallctl to dynamically control the switch of the memory profile and output the memory profile information Effect.

The memory grows rapidly until oom

Such a situation is generally that the same code will appear when facing different business scenarios, because a certain input (usually a large amount of data) causes the program's memory to grow rapidly.

However, with the memory profiling function mentioned above, rapid memory growth is actually a very easy to solve situation, because we can turn on the profile switch during the rapid growth process, after a period of time, output the profile results and visualize them with the corresponding tools , You can clearly understand which functions are called and which structures are allocated for memory.

However, there are two situations here: reproducible and difficult to reproduce. The handling methods for the two cases are different. The following is an operable solution for these two cases.

Can be reproduced

Scenes that can be reproduced are actually the easiest problem to solve, because we can dynamically open the profile during the reproduction period and obtain a large amount of memory allocation information in a short period of time.

Here is a complete demo to show how to perform dynamic memory profile in Rust application.

In this article, I will use the jemalloc-sys jemallocator jemalloc-ctl to profile the memory. The main functions of these three libraries are:

jemalloc-sys : Encapsulate jemalloc.

jemallocator : Rust realized the GlobalAlloc , to replace the default memory allocator.

jemalloc-ctl : Provides a package for mallctl, which can be used for tuning, dynamically configuring the configuration of the distributor, and obtaining the statistical information of the distributor.

The following are the dependencies of the demo project:

[dependencies]
jemallocator = "0.3.2"
jemalloc-ctl = "0.3.2"
[dependencies.jemalloc-sys]
version = "0.3.2"
features = ["stats", "profiling", "unprefixed_malloc_on_supported_platforms"]
[profile.release]
debug = true

The most important thing is jemalloc-sys need to be opened, otherwise the subsequent profile will fail. In addition, it needs to be emphasized that the operating environment of the demo is running under the Linux environment.

Then the code of src/main.rs of the demo is as follows:

use jemallocator;
use jemalloc_ctl::{AsName, Access};
use std::collections::HashMap;
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
const PROF_ACTIVE: &'static [u8] = b"prof.active\0";
const PROF_DUMP: &'static [u8] = b"prof.dump\0";
const PROFILE_OUTPUT: &'static [u8] = b"profile.out\0";
fn set_prof_active(active: bool) {
    let name = PROF_ACTIVE.name();
    name.write(active).expect("Should succeed to set prof");
}
fn dump_profile() {
    let name = PROF_DUMP.name();
    name.write(PROFILE_OUTPUT).expect("Should succeed to dump profile")
}
fn main() {
    set_prof_active(true);
    let mut buffers: Vec<HashMap<i32, i32>> = Vec::new();
    for _ in 0..100 {
        buffers.push(HashMap::with_capacity(1024));
    }
    set_prof_active(false);
    dump_profile();
}

The demo is already a very simplified test case. The main explanations are as follows:

set_prof_active and dump_profile are to call mallctl function jemalloc provided by jemalloc-ctl, by giving the appropriate key to set the value, such as this is to prof.active set the Boolean value to profile.dump set the dump file path.

After the compilation is complete, it is not possible to run the program directly, you need to set the environment variables (enable the memory profile function):

export MALLOC_CONF=prof:true

Then run the program again, it will output a memory profile file, the file name in the demo is profile.out -coded-0618136608c1cb, this is a text file, which is not conducive to direct observation (no intuitive symbol).

Through tools such as jeprof, it can be directly converted into visual graphics:

jeprof --show_bytes --pdf <path_to_binary> ./profile.out > ./profile.pdf

In this way, it can be visualized. From the figure below, we can clearly see all memory sources:

The overall process of this demo is complete. If it is applied to production, only trivial work is missing. The following is our production practice:

Encapsulate it as an HTTP service, which can be directly triggered by curl command, and the result is returned through HTTP response.
Support setting profile duration.
Handle situations where profiles are triggered concurrently.

Having said that, there is actually one advantage of this scheme that has not been mentioned, that is, its dynamic nature. Because turning on the memory profile function is bound to have a certain impact on performance (although the impact here is not particularly large), we naturally hope to avoid turning on the profile function when there is no problem, so this dynamic switch is still very practical of.

Difficult to reproduce

In fact, problems that can be reproduced stably are not a problem. In production, the most troublesome problem is the problem that is difficult to reproduce. The problem that is difficult to reproduce is like a time bomb. The reproducibility conditions are very harsh and it is difficult to accurately locate the problem, but the problem will appear suddenly, which is very troublesome. .

Generally, for problems that are difficult to reproduce, the main idea is to prepare the site in advance. When a problem occurs, although there is a problem with the service, we save the site where the problem occurred. For example, the problem of excessive memory usage here is a very good idea: when oom, a coredump is generated.

However, we did not use the coredump method in our production practice. The main reason is that the server node memory in the production environment is often large, and the coredump generated is also very large. It takes a lot of time to generate the coredump, which will affect the speed of restarting immediately. In addition, analysis, transmission, and storage are not very convenient.

Here is an introduction to the solution we adopted in the production environment. In fact, it is also a very simple method. Through the functions provided by jemalloc, it is easy to indirectly output the memory profile results.

After starting a long-running program that uses jemalloc, use environment variables to set the jemalloc parameters:

export MALLOC_CONF=prof:true,lg_prof_interval:30

The parameter here adds a lg_prof_interval:30 , which means that every time the memory increases by 1GB (2^30, it can be modified as needed, here is just an example), a memory profile is output. In this way, over time, if there is a sudden increase in memory (exceeding the set threshold), then the corresponding profile will be generated, then we can locate the problem according to the creation date of the file when a problem occurs At that moment, what kind of memory allocation has occurred.

The memory grows slowly and is not released

Different from the rapid growth of memory, the overall use of memory is in a stable state, but as time goes by, memory is growing steadily and slowly. Through the methods mentioned above, it is difficult to find out where the memory is used.

This problem is also one of the very difficult problems we encountered in production. Compared with the previous drastic changes, we no longer care about those allocation events that have occurred. We are more concerned about the current memory distribution, but in the absence of GC In Rust, observing the memory distribution of the current program is not a very simple matter (especially if it does not affect the production operation).

In response to this situation, our practice in the production environment is as follows:

Manually release part of the structure (often cache) memory
Then observe the memory changes before and after (how much memory is released) to determine the memory size of each module

With the statistical function of jemalloc, the current memory usage can be obtained, and we can repeat the release of the memory of the specified module + calculate the release size to determine the distribution of the memory.

The shortcomings of this solution are also obvious, that is, the modules involved in the memory occupation detection are a priori (you can't find the memory occupation modules outside your knowledge), but this flaw is still acceptable, because a program may occupy too much memory. We often know about big places.

A demo project is given below, which can be applied to production based on this demo project.

The following are the dependencies of the demo project:

[dependencies]
jemallocator = "0.3.2"
jemalloc-ctl = "0.3.2"
[dependencies.jemalloc-sys]
version = "0.3.2"
features = ["stats", "profiling", "unprefixed_malloc_on_supported_platforms"]
[profile.release]
debug = true

The code of src/main.rs of demo:

use jemallocator;
use jemalloc_ctl::{epoch, stats};
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
fn alloc_cache() -> Vec<i8> {
    let mut v = Vec::with_capacity(1024 * 1024);
    v.push(0i8);
    v
}
fn main() {
    let cache_0 = alloc_cache();
    let cache_1 = alloc_cache();
    let e = epoch::mib().unwrap();
    let allocated_stats = stats::allocated::mib().unwrap();
    let mut heap_size = allocated_stats.read().unwrap();
    drop(cache_0);
    e.advance().unwrap();
    let new_heap_size = allocated_stats.read().unwrap();
    println!("cache_0 size:{}B", heap_size - new_heap_size);
    heap_size = new_heap_size;
    drop(cache_1);
    e.advance().unwrap();
    let new_heap_size = allocated_stats.read().unwrap();
    println!("cache_1 size:{}B", heap_size - new_heap_size);
    heap_size = new_heap_size;
    println!("current heap size:{}B", heap_size);
}

Compared with the previous demo, it is a bit longer, but the idea is very simple. Just briefly explain a note about the use of jemalloc-ctl. Before obtaining new statistics, you must first call epoch.advance() .

The following is the output information of my run after compilation:

cache_0 size:1048576B
cache_1 size:1038336B
current heap size:80488B

It can be found here that the size of cache_1 is not strictly 1MB. This can be said to be normal. Generally speaking (not for this demo), there are two main reasons:

During memory statistics, other memory changes are taking place.
The stats data provided by jemalloc is not necessarily completely accurate, because it is impossible to use global statistics for better multi-core performance, so it is actually for performance, giving up the consistency of statistical information.

However, the inaccuracy of this information will not hinder the problem of excessively high positioning memory, because the released memory is often huge, and a small disturbance will not affect the final result.

In addition, there is actually a simpler solution, which is to directly observe the memory changes of the machine by releasing the cache, but what you need to know is that the memory is not necessarily returned to the OS immediately, and it is more tiring to rely on eye observation. A better solution is Integrate this memory distribution check function into your Rust application.

How to troubleshoot Rust's high memory usage in a production environment

Memory allocator

The memory grows rapidly until oom

Can be reproduced

Difficult to reproduce

The memory grows slowly and is not released

Other general solutions

metrics

Memory troubleshooting tool

SOFAStack

引用和评论

蚂蚁 Flink 实时计算编译任务 Koupleless 架构改造

三分钟掌握视频剪辑 | 在 Rust 中优雅地集成 FFmpeg

三分钟掌握音视频处理 | 在 Rust 中优雅地集成 FFmpeg

从0到1：Rust 如何用 FFmpeg 和 OpenGL 打造硬核视频特效

Rust 开发者必备：三分钟搞定视频缩略图生成

FunProxy - 使用 Rust 构建跨平台全链路测试抓包代理工具

三分钟掌握视频分辨率修改 | 在 Rust 中优雅地使用 FFmpeg

How to troubleshoot Rust&#39;s high memory usage in a production environment

Memory allocator

The memory grows rapidly until oom

Can be reproduced

Difficult to reproduce

The memory grows slowly and is not released

Other general solutions

metrics

Memory troubleshooting tool

SOFAStack

引用和评论

蚂蚁 Flink 实时计算编译任务 Koupleless 架构改造

三分钟掌握视频剪辑 | 在 Rust 中优雅地集成 FFmpeg

三分钟掌握音视频处理 | 在 Rust 中优雅地集成 FFmpeg

从0到1：Rust 如何用 FFmpeg 和 OpenGL 打造硬核视频特效

Rust 开发者必备：三分钟搞定视频缩略图生成

FunProxy - 使用 Rust 构建跨平台全链路测试抓包代理工具

三分钟掌握视频分辨率修改 | 在 Rust 中优雅地使用 FFmpeg

How to troubleshoot Rust's high memory usage in a production environment