Author: Sun Qiao

A member of the DBA team of Aikesen, mainly responsible for the daily maintenance of MySQL and DMP platforms, and has a strong interest in automatic database operation and maintenance.

Source of this article: original contribution

*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.


foreword

For those who are new to the MySQL database, they will encounter some difficult problems more or less; for example, when the database disk is not damaged, we often encounter a lot of disk IO pressure on the database server, but Since there is only a MySQL program on the server, our first thought must be that a connected user is messing around behind our backs after connecting to the database; however, the results seen directly through the show processlist command may not be accurate. Let's simulate a simple scenario and let's pinpoint exactly where the problem is.

Scenario simulation, creating IO pressure

 ## 生成一张大表,用于测试[root@tidb-master ~]# /usr/share/sysbench/oltp_write_only.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=15000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 prepare
sysbench 1.0.17 (using system LuaJIT 2.0.4)

Creating table 'sbtest1'...
Inserting 15000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...## 对生成的表进行写操作
[root@tidb-master ~]# /usr/share/sysbench/oltp_write_only.lua --mysql-host=10.186.61.36 --mysql-user=sun --mysql-password='xx' --mysql-port=3306 --mysql-db=testdb --tables=1 --table_size=15000000 --auto_inc=off --report-interval=2 --db-ps-mode=disable --time=100000 --threads=1 run
sysbench 1.0.17 (using system LuaJIT 2.0.4)

Running the test with following options:
Number of threads: 1
Report intermediate results every 2 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 2s ] thds: 1 tps: 24.47 qps: 148.33 (r/w/o: 0.00/98.89/49.44) lat (ms,95%): 86.00 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 1 tps: 53.02 qps: 318.62 (r/w/o: 0.00/212.58/106.04) lat (ms,95%): 48.34 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 1 tps: 60.50 qps: 363.50 (r/w/o: 0.00/242.50/121.00) lat (ms,95%): 43.39 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 1 tps: 65.50 qps: 391.02 (r/w/o: 0.00/260.01/131.01) lat (ms,95%): 39.65 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 1 tps: 55.50 qps: 334.51 (r/w/o: 0.00/223.51/111.00) lat (ms,95%): 44.98 err/s: 0.00 reconn/s: 0.00
[ 12s ] thds: 1 tps: 57.00 qps: 340.99 (r/w/o: 0.00/227.00/114.00) lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00

problem analysis

1. I found that the io wait is relatively high through the top command. Hey, what is going on? (pretend not to know that I am testing (●'◡'●))

 top - 17:05:59 up 179 days,  7:05,  2 users,  load average: 0.84, 0.70, 0.93
Tasks: 181 total,   1 running, 180 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.3 us,  0.7 sy,  0.0 ni, 95.3 id,  2.7 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  6.9 us,  3.8 sy,  0.0 ni, 60.3 id, 29.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  2.4 us,  2.0 sy,  0.0 ni, 90.8 id,  4.7 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  1.7 us,  1.3 sy,  0.0 ni, 95.3 id,  1.7 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  1.4 us,  1.0 sy,  0.0 ni, 95.6 id,  2.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  3.1 us,  1.4 sy,  0.0 ni, 81.4 id, 14.2 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.7 us,  1.0 sy,  0.0 ni, 96.6 id,  1.7 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  1.0 us,  0.3 sy,  0.0 ni, 96.6 id,  2.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  0.3 us,  1.0 sy,  0.0 ni, 94.6 id,  4.1 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :  2.4 us,  1.0 sy,  0.0 ni, 91.2 id,  5.4 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 : 14.7 us,  2.7 sy,  0.0 ni, 82.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  2.0 us,  1.0 sy,  0.0 ni, 96.0 id,  1.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 12137024 total,   185984 free,  3987936 used,  7963104 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  7150088 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                            
 8858 actiont+  20   0 7249940   1.1g  11892 S  30.0  9.3 210:38.75 mysqld                                                                                             
26447 tidb      20   0 4000364   1.0g   4740 S   5.3  9.0  10018:23 TiFlashMain                                                                                        
22761 actiont+  20   0 2526348  36188   9756 S   4.0  0.3   1197:43 ustats                                                                                             
29004 root      20   0 3784084 103676   2012 S   2.0  0.9   1802:29 beam.s

2. Through further analysis of the iostat command, we can see that the disk is frequently writing, which is maintained at about 26M/s (the %uitl value is relatively large. Generally speaking, a value greater than 75% indicates that the disk pressure is relatively large)

 [root@tidb-tikv01 ~]# iostat  -xm 1 
Linux 3.10.0-862.14.4.el7.x86_64 (tidb-tikv01)  04/22/2022      _x86_64_        (12 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.21    0.00    0.44    0.18    0.10   98.07

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.01     0.18    3.53   22.99     0.08     0.25    25.84     0.19    7.16    5.64    7.40   0.71   1.88
scd0              0.00     0.00    0.00    0.00     0.00     0.00     6.00     0.00    0.50    0.50    0.00   0.50   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.22    0.00    1.19    8.28    0.09   88.23

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00  328.00 1800.00     5.12    26.24    30.17     3.18    1.49    4.68    0.91   0.41  88.30
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.63    0.00    1.03    6.68    0.09   90.57

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00  352.00 1918.00     5.50    27.82    30.06     2.96    1.31    3.21    0.96   0.37  83.60
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.54    0.00    1.96    8.77    0.09   83.65

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00  300.00 1682.00     4.69    25.68    31.38     3.09    1.55    3.68    1.17   0.45  88.60
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

3. Through the iotop command, you can see that the IO consumption of MySQL-related thread ID 22233 is 52.59%, the read is 2.56M/s, and the write of thread ID 8879 is 27.31M/s (generally, the larger the IO column value, the stating that IO pressure is most likely related to it)

 Total DISK READ :       5.29 M/s | Total DISK WRITE :      29.42 M/s
Actual DISK READ:       5.29 M/s | Actual DISK WRITE:      26.83 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                  
22233 be/4 actionte    2.56 M/s  473.07 K/s  0.00 % 52.59 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
 8879 be/4 actionte    0.00 B/s   27.31 M/s  0.00 % 17.76 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
 8887 be/4 actionte    0.00 B/s    0.00 B/s  0.00 % 10.10 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
 8907 be/4 actionte  116.33 K/s    0.00 B/s  0.00 %  6.27 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
 8930 be/4 actionte    7.76 K/s    0.00 B/s  0.00 %  3.45 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
 8910 be/4 actionte 1217.57 K/s    0.00 B/s  0.00 %  3.33 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
 8911 be/4 actionte  372.25 K/s    0.00 B/s  0.00 %  1.53 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306
 2946 be/4 actionte    0.00 B/s 1985.34 B/s  0.00 %  1.52 % ustats
 8909 be/4 actionte  418.78 K/s    0.00 B/s  0.00 %  1.08 % mysqld --defaults-file=/opt/mysql/etc/3306/my.cnf --d~ --socket=/opt/mysql/data/3306/mysqld.sock --port=3306

4. Bring the system thread ID obtained above into MySQL, and we can find out that the connected user sun has been writing frequently. Now I doubt you, but I will keep silent for now.

 [root@tidb-tikv01 ~]# /opt/mysql/base/8.0.21/bin/mysql -S /opt/mysql/data/3306/mysqld.sock -pxx
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 15041
Server version: 8.0.21 MySQL Community Server - GPL

Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select * from performance_schema.threads where THREAD_OS_ID=22233\G
*************************** 1. row ***************************
          THREAD_ID: 14880
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 14830           ### ID为148830连接在疯狂的进行写操作,show processlist可以看到正在进行的连接
   PROCESSLIST_USER: sun
   PROCESSLIST_HOST: 10.186.61.16
     PROCESSLIST_DB: testdb
PROCESSLIST_COMMAND: Query
   PROCESSLIST_TIME: 0
  PROCESSLIST_STATE: updating
   PROCESSLIST_INFO: UPDATE sbtest1 SET k=k+1 WHERE id=7899611
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: TCP/IP
       THREAD_OS_ID: 22233          ### THREAD_OS_ID为linux系统中的线程ID
     RESOURCE_GROUP: USR_default
1 row in set (0.00 sec)

mysql> select * from performance_schema.threads where THREAD_OS_ID=8879\G
*************************** 1. row ***************************
          THREAD_ID: 21
               NAME: thread/innodb/page_flush_coordinator_thread ### MySQL自身刷脏数据线程,因为用户的频繁写操作,导致该线程一直进行落盘操作
               TYPE: BACKGROUND
     PROCESSLIST_ID: NULL
   PROCESSLIST_USER: NULL
   PROCESSLIST_HOST: NULL
     PROCESSLIST_DB: NULL
PROCESSLIST_COMMAND: NULL
   PROCESSLIST_TIME: 264523
  PROCESSLIST_STATE: NULL
   PROCESSLIST_INFO: NULL
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: NULL
       THREAD_OS_ID: 8879
     RESOURCE_GROUP: SYS_default
1 row in set (0.00 sec)

mysql> show processlist;
+-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| Id    | User            | Host               | db     | Command          | Time   | State                                                         | Info                                                                                                 |
+-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
|     5 | event_scheduler | localhost          | NULL   | Daemon           | 264544 | Waiting on empty queue                                        | NULL                                                                                                 |
|     8 | universe_op     | 127.0.0.1:38362    | NULL   | Sleep            |      0 |                                                               | NULL                                                                                                 |
|     9 | universe_op     | 127.0.0.1:38374    | NULL   | Sleep            |      0 |                                                               | NULL                                                                                                 |
|    10 | universe_op     | 127.0.0.1:38376    | NULL   | Sleep            |      0 |                                                               | NULL                                                                                                 |
|    11 | universe_op     | 127.0.0.1:38380    | NULL   | Sleep            |      0 |                                                               | NULL                                                                                                 |
|    12 | universe_op     | 127.0.0.1:38378    | NULL   | Sleep            |      0 |                                                               | NULL                                                                                                 |
|    13 | universe_op     | 127.0.0.1:38384    | NULL   | Sleep            |      0 |                                                               | NULL                                                                                                 |
|    14 | universe_op     | 127.0.0.1:38382    | NULL   | Sleep            |      0 |                                                               | NULL                                                                                                 |
|    15 | universe_op     | 10.186.61.45:50322 | NULL   | Binlog Dump GTID | 264542 | Master has sent all binlog to slave; waiting for more updates | NULL                                                                                                 |
|    17 | universe_op     | 127.0.0.1:38454    | NULL   | Sleep            |     14 |                                                               | NULL                                                                                                 |
|    18 | universe_op     | 127.0.0.1:38456    | NULL   | Sleep            |     14 |                                                               | NULL                                                                                                 |
|    19 | universe_op     | 127.0.0.1:38458    | NULL   | Sleep            |     14 |                                                               | NULL                                                                                                 |
|    20 | universe_op     | 127.0.0.1:38460    | NULL   | Sleep            |     14 |                                                               | NULL                                                                                                 |
|    21 | universe_op     | 127.0.0.1:38462    | NULL   | Sleep            |     14 |                                                               | NULL                                                                                                 |
|    22 | universe_op     | 127.0.0.1:38464    | NULL   | Sleep            |     14 |                                                               | NULL                                                                                                 |
|    23 | universe_op     | 127.0.0.1:38452    | NULL   | Sleep            |     14 |                                                               | NULL                                                                                                 |
|    24 | universe_op     | 127.0.0.1:38466    | NULL   | Sleep            |     14 |                                                               | NULL                                                                                                 |
|   142 | universe_op     | 127.0.0.1:33390    | NULL   | Sleep            |      0 |                                                               | NULL                                                                                                 |
| 14830 | sun             | 10.186.61.16:60730 | testdb | Query            |      0 | update                                                        | INSERT INTO sbtest1 (id, k, c, pad) VALUES (7493994, 6518751, '15795471662-35813531247-93886706048-2 |
| 15041 | root            | localhost          | NULL   | Query            |      0 | starting                                                      | show processlist                                                                                     |
+-------+-----------------+--------------------+--------+------------------+--------+---------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
20 rows in set (0.00 sec)

5. After killing 14830, I found that the disk pressure is indeed over. You "Xiaoyangren" ( ̄ε(# ̄)☆╰╮o( ̄Ware ̄///)

 mysql> kill 14830;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye
Segmentation fault
[root@tidb-tikv01 ~]# iostat -xm 1
Linux 3.10.0-862.14.4.el7.x86_64 (tidb-tikv01)  04/22/2022      _x86_64_        (12 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.21    0.00    0.44    0.18    0.10   98.06

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.01     0.18    3.83   24.55     0.09     0.27    26.12     0.19    6.78    5.46    6.99   0.69   1.95
scd0              0.00     0.00    0.00    0.00     0.00     0.00     6.00     0.00    0.50    0.50    0.00   0.50   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.85    0.00    0.59    0.00    0.00   97.56

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00     0.00    0.00   61.00     0.00     0.46    15.52     0.01    0.20    0.00    0.20   0.18   1.10
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.36    0.00    0.84    0.08    0.00   95.71

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
vda               0.00    10.00    1.00   53.00     0.02     0.48    18.81     2.87    0.22    1.00    0.21   1.31   7.10
scd0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

instruction manual:

Don't kill casually in the production environment, if there is a problem, you will be whipped.

Summarize

For some new cuties, the above methods can help you step by step to locate which connection threads are driving your server IO tossing high, allowing you to better understand the linkage between the Linux system and MySQL. Old bird, emmm...


爱可生开源社区
426 声望209 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。