K8s memory issue debug过程详解

最近遇到个memory相关的问题，在debug过程中发现非常有意思，在此记录下来，希望有所帮助。

问题

pod中某个container的memory usage持续增长，初始值为60Mi。在运行2天之后，通过kubectl top command --containers command查看，发现memory usage已经达到了400Mi。
但是通过docker stats看，memory usage为正常值。示例如下：

通过kubectl命令，memory usage 19Mi

[@ ~]$kubectl top pod nginx-deployment-66979f666d-wd24b --containers
POD                                 NAME    CPU(cores)   MEMORY(bytes)
nginx-deployment-66979f666d-wd24b   nginx   500m         19Mi

但是通过docker命令看，memory usage为正常值2.461Mi

CONTAINER ID        NAME                                                                                         CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
5d14f804062d        k8s_nginx_nginx-deployment-66979f666d-wd24b_default_65cad64e-9696-4a3a-9bc6-08f93c6d263b_0   49.94%              2.461MiB / 20MiB    12.30%              0B / 0B             0B / 0B             4

Debug过程

1. 为什么`kubectl top` 和`docker stats`结果不一致

因为2者的统计方式不一样。通过如下的命令可以看到container详细的memory usage

curl --unix-socket /var/run/docker.sock "http:/v1.24/containers/5d14f804062d/stats"
{
"memory_stats":{
      "usage":20332544,
      "max_usage":20971520,
      "stats":{
         "active_anon":1990656,
         "active_file":17420288,
         "cache":17608704,
         "dirty":0,
         "hierarchical_memory_limit":20971520,
         "hierarchical_memsw_limit":20971520,
         "inactive_anon":4096,
         "inactive_file":184320,
         "mapped_file":4096,
         "pgfault":7646701079,
         "pgmajfault":0,
         "pgpgin":1265901953,
         "pgpgout":1265897156,
         "rss":2039808,
         "rss_huge":0,
         "total_active_anon":1990656,
         "total_active_file":17420288,
         "total_cache":17608704,
         "total_dirty":0,
         "total_inactive_anon":4096,
         "total_inactive_file":184320,
         "total_mapped_file":4096,
         "total_pgfault":7646701079,
         "total_pgmajfault":0,
         "total_pgpgin":1265901953,
         "total_pgpgout":1265897156,
         "total_rss":2039808,
         "total_rss_huge":0,
         "total_unevictable":0,
         "total_writeback":0,
         "unevictable":0,
         "writeback":0
      },
      "failcnt":10181490,
      "limit":20971520
   },
}

kubectl top command统计的是 "usage":20332544, 约为19Mi(并不是精确相等)
docker stats command 统计的是 usage - cache, 即20332544 - 17608704，约为2.5Mi.
此外还可以看到cache持续的增长。
那么现在问题来了：cache是什么 ?　为什么cache持续增长？
cache是什么？

|
From https://docs.docker.com/confi...，
cache
The amount of memory used by the processes of this control group that can be associated precisely with a block on a block device. When you read from and write to files on disk, this amount increases. This is the case if you use “conventional” I/O (open, read, write syscalls) as well as mapped files (with mmap). It also accounts for the memory used by tmpfs mounts, though the reasons are unclear.

2. 为什么cache持续增长？

出问题的container逻辑非常简单，只有简单的逻辑操作和写入log到logfile。首先排除了业务逻辑的问题，那么有没有可能是log的问题呢？
通过实验，发现当暂停写入log时，cache不再增长。因为logfile是在EmptyDir的volume，也就是位于host的disk上的。通过docker inspect可以看到mount的路径

  "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/65cad64e-9696-4a3a-9bc6-08f93c6d263b/volumes/kubernetes.io~empty-dir/nginx-vol",
                "Destination": "/tmp/memorytest",
                "Mode": "Z",
                "RW": true,
                "Propagation": "rprivate"
            },

那么现在问题来了：为什么log写入disk上，会导致memory cache的增加？

3. 为什么写disk会导致memory cache的增加？

原因是Linux is borrowing unused memory for disk caching. 具体分析可见https://www.linuxatemyram.com...
需要注意的是

disk caching only borrows the ram that applications don't currently want. It will not use swap. If applications want more memory, they just take it back from the disk cache. They will not start swapping.

由于log导致的memory增加，那么会不会导致memory usage达到memory limit, 导致container Killed？

4. memory usage增加会不会导致OOMKilled？

不会。首先

如上描述，当memory快到达limit时，application will take it back from disk cache. 也就是接近limit时，此时log会写到disk上，memory cache不会再增长。
k8s中container的resource memory limit是传递给docker的，等同于docker run --memory. 也就是说只有docker意义的memory超了，才会出现OOMKilled的问题。

Reference Link：

https://docs.docker.com/confi...
https://www.linuxatemyram.com...
https://www.linuxatemyram.com...
https://www.ibm.com/support/p...

K8s memory issue debug过程详解

问题

Debug过程

1. 为什么`kubectl top` 和`docker stats`结果不一致

2. 为什么cache持续增长？

3. 为什么写disk会导致memory cache的增加？

4. memory usage增加会不会导致OOMKilled？

Reference Link：

龙内cool

引用和评论

Kubernetes job

K8s memory issue debug过程详解

问题

Debug过程

1. 为什么kubectl top 和docker stats结果不一致

2. 为什么cache持续增长？

3. 为什么写disk会导致memory cache的增加？

4. memory usage增加会不会导致OOMKilled？

Reference Link：

龙内cool

引用和评论

Kubernetes job

1. 为什么`kubectl top` 和`docker stats`结果不一致