Author: Ren Kun
Now living in Zhuhai, he has served as full-time Oracle and MySQL DBA. Now he is mainly responsible for the maintenance of MySQL, mongoDB and Redis.
Source of this article: original submission
* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.
1. Background
Receive a disk space alarm during working hours
Log in to the machine to check that the root partition is less than 16G, and more than 80% has been used at the moment.
View the 10 largest files under the root partition and arrange them in descending order of size
du ‐Sm / ‐‐exclude="/data" | sort ‐k1nr | head ‐10
This command returned after a few seconds of execution in other environments, and it was executed on this machine for nearly 1 minute. Finally, several log files were located, which can be deleted directly.
Just about to log out, I received another memory alarm. It was this machine.
2. Diagnosis
Check the memory usage, it has indeed been exhausted
top view the most memory-consuming processes
Each mysqld that consumes more memory only occupies 43G, even if you add the other processes in the screenshot, it occupies at most 44G.
To avoid missing calculations, count the total physical memory occupied by all processes:
[root@centos6564‐220 ~]# more RSS.sh
#/bin/bash
for PROC in `ls /proc/|grep "^[0‐9]"`
do
if [ ‐f /proc/$PROC/statm ]; then
TEP=`cat /proc/$PROC/statm | awk '{print ($2)}'`
RSS=`expr $RSS + $TEP`
fi
done
RSS=`expr $RSS \* 4 / 1024 / 1024`
echo $RSS"GB"
[root@centos6564‐220 ~]# sh RSS.sh
44GB
Note: This script comes from an article by Chu Ba many years ago http://blog.yufeng.info/archives/2456
The question is, who occupied the remaining 10 gigabytes of memory?
Neither top nor ps can give an answer, only the /proc/meminfo file can be viewed
Pay attention to the information in the red box, the slab consumes too much cache, and all of them are recyclable, depending on the size, it happens to be about 16G.
Reminiscent of the extremely slow du -Sm command that was just executed, the answer seems obvious.
<br/>
Execute slabtop
Dentry and inode take up a lot of memory, but the output is not intuitive enough.
View more than 100M slab cache components
[root@centos6564‐220 ~]# cat /proc/slabinfo |awk '{if($3*$4/1024/1024 > 1
00){print $1,$3*$4/1024/1024 "MB"} }'
xfs_inode 113.586MB
proc_inode_cache 11989.1MB
dentry 3526.19MB
Inode occupies nearly 12G, dentry occupies 3.5G, everything is right.
Since the memory is insufficient due to the slab cache, you can directly use the echo 2> /proc/sys/vm/drop_caches command to recover.
At this point, the problem of insufficient memory has been resolved.
3. Sourcing
A new question arises, which directory is consuming a lot of inode/dentry?
Execute the following command to count the total number of files and subdirectories owned by each directory under the root directory.
for i in `ls / `; do
count=`ls ‐lR /$i | wc ‐l`
echo "$i has $count files and dirs"
done
...
proc has 32940780 files and dirs
root has 462 files and dirs
sbin has 287 files and dirs
tmp has 2 files and dirs
....
/proc is the culprit, continue to explore the subdirectories under /proc, this time the statistical refinement is divided into files and subdirectories
for i in `ls /proc `; do
files=`ls ‐lR /proc/$i | grep "^‐" | wc ‐l`
dirs=`ls ‐lR /proc/$i | grep "^d" | wc ‐l`
echo "$i has $files files and $dirs dirs" >> /tmp/count_tmps
done
7
#15049进程占用了1600w个文件
[root@centos6564‐220 tmp]# more count_tmps | sort ‐k3nr | head ‐5
15049 has 16381888 files and 964 dirs
17211 has 7653 files and 349 dirs
6053 has 7511 files and 384 dirs
18720 has 2289 files and 269 dirs
sys has 1166 files and 119 dirs
[root@centos6564‐220 tmp]# ps ‐ef | grep 15049
mysql 15049 13839 41 Jan07 ? 85‐09:04:44 /usr/sbin/mysqld ‐‐basedir=/usr
‐‐datadir=/data/var ‐‐plugin‐dir=/usr/lib64/mysql/plugin ‐‐user=mysql ‐‐
log‐error=/data/var/err.log ‐‐pid‐file=/data/var/mysql.pid ‐‐socket=/data/v
ar/mysql.sock ‐‐port=3306
root 24912 25232 0 01:10 pts/3 00:00:00 grep 15049
How embarrassing, it turned out to be a problem with mysql for a long time.
Statistics of sub-directory information under /proc/15049
dest=/proc/15049
for i in `ls $dest `; do
files=`ls ‐lR $dest/$i | grep "^‐" | wc ‐l`
dirs=`ls ‐lR $dest/$i | grep "^d" | wc ‐l`
echo "$i has $files files and $dirs dirs" >> /tmp/15049
done
[root@centos6564‐220 task]# cat /tmp/15049 | sort ‐k3nr | head ‐5
task has 20428788 files and 955 dirs
fdinfo has 106921 files and 0 dirs
net has 60 files and 3 dirs
attr has 6 files and 0 dirs
autogroup has 1 files and 0 dirs
The /proc/15432/task directory records the information of each child thread derived from mysqld.
View the current number of child threads of mysqld
[root@centos6564‐220 task]# cat /proc/15049/status | grep ‐i threads
Threads: 191
Continue to view the specific information of each child thread
dest=/proc/15049/task
for i in `ls $dest `; do
files=`ls ‐lR $dest/$i | grep "^‐" | wc ‐l`
dirs=`ls ‐lR $dest/$i | grep "^d" | wc ‐l`
echo "$i has $files files and $dirs dirs" >> /tmp/15049_task
done
[root@centos6564‐220 tmp]# more 15049_task | sort ‐k3nr | head ‐5
15049 has 106957 files and 4 dirs
15058 has 106957 files and 4 dirs
15117 has 106957 files and 4 dirs
15118 has 106957 files and 4 dirs
15119 has 106957 files and 4 dirs
The number of files opened by each child thread is the same, enter the fd directory of any one child thread,
[root@centos6564‐220 fd]# pwd
/proc/15432/task/15120/fd
[root@centos6564‐220 fd]# ls | wc ‐l
85286
#每一个fd都是1个mysql文件,看命名格式是分区表
[root@centos6564‐220 fd]# ll 39326
lrwx‐‐‐‐‐‐ 1 root root 64 Jul 31 01:34 39326 ‐> /data/tokudb/_*_sql_204d_
542_P_p20161105_188_status_3f14ef598_1_1d.tokudb
4 Conclusion
Log in to the database to view information_schema.partitions and found more than 100 partition tables, starting with 1000 partitions for each table.
This set of examples was handed over to us halfway through other departments. The biggest feature of this group of students using mysql is that they like to create partition tables.
In addition, the whole diagnosis process free -g shows that the free physical memory is 0, but mysqld is not OOM. This is because the 16G memory that is additionally consumed is all recyclable and can be reused by other memory requests.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。