Author: Hu Chengqing

A member of the DBA team of Aikesheng, good at failure analysis and performance optimization, personal blog: https://www.jianshu.com/u/a95ec11f67a8 , welcome to discuss.

Source of this article: original contribution

*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.


background

I encountered a small problem when installing OceanBase two days ago:

Obviously, when installing the OB, the available memory of the server is required to be at least 8G. If it does not meet the standard, it cannot be installed. I have spent a lot of time trying to put together these 3 servers with 10G memory. Isn't there 9G in free in the output of free -m? Why do I still report an error?

Take a closer look at the above picture, the available is only 6.3G, and the Free, which reports an error in the OB installation, is actually available.

Then why is the output of free -m: free has 9.3G, while available has only 6.3G?

Usually we think of MemAvailable as the sum of buffer/cache and free. But in fact it is not, it is actually closely related to min_free_kbytes.

min_free_kbytes

kswapd is a process designed to periodically reclaim memory. To measure memory usage, three memory thresholds (watermarks, also known as water levels) are defined, namely watermark[min/low/high]:

The above figure basically reveals the meaning of several water levels. When MemFree is lower than watermark[low], kswapd will perform memory recycling, and stop recycling until the free memory reaches watermark[high]. If the speed of requesting memory is too fast, causing the free memory to drop to watermark[min], the kernel will perform direct reclaim (direct reclaim) and use the reclaimed free pages to satisfy the memory request, which will block the application. The size of watermark[min] is equal to the value of the kernel parameter min_free_kbytes, and the relationship between the other water levels is:

  • watermark[low] = watermark[min]*5/4
  • watermark[high] = watermark[min]*3/2

MemAvailable

Obviously, the memory below watermark[min] belongs to the reserved memory of the system and will not be used by ordinary processes. MemAvailable means memory that can be allocated and used, so it should not contain this piece of memory. In fact, its calculation formula is:

MemAvailable = MemFree - watermark[LOW] + (PageCache - min(PageCache / 2, watermark[LOW]))

Knowing how MemAvailable is calculated, the next step is very simple, first check the setting of min_free_kbytes:

 [root@observer2 ~]# cat /proc/sys/vm/min_free_kbytes
2097152

2G is the deployment specification of OB. Since it is a test environment, after modifying it to 64M, MemAvailable meets the requirements:

min_free_kbytes setting suggestion

In the deployment specification of OB, min_free_kbytes=2G is stipulated. I have to say that this point is very detailed, because:

  1. The system will automatically calculate the min_free_kbytes size according to the memory size, but it is not a linear relationship. Take a server as an example, 256G memory, min_free_kbytes is only 132M:
 root@idrc-110:~# cat /proc/sys/vm/min_free_kbytes
135168
root@idrc-110:~# free -m
              total        used        free      shared  buff/cache   available
Mem:         257897       60060        2068       18161      195768      178009
Swap:           616           6         610
  1. If min_free_kbytes is set to a small value, the remaining available memory of the system is likely to bottom out, and direct reclaim will cause serious performance degradation. On the contrary, if the setting is very large, the three watermarks of watermark[min/low/high] will be very large, which often triggers memory reclamation and reduces memory utilization.

Therefore, it is very reasonable to reserve 2G memory for the system, which is an optimization point that is easily overlooked.

References

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773


爱可生开源社区
426 声望211 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。