Author: Liu Kaiyang

The Beijing DBA of Eikesheng delivery service team has a strong learning interest in database and peripheral technology, likes to read books, and pursue technology.

Source of this article: original submission

* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.


A strange lock problem was discovered on the customer site some time ago, and I specially brought it to share with you.

phenomenon

The MySQL version is 8.0.18. The phenomenon observed in the thread waiting for connection from the library is as follows:

mysql> select * from threads;
+-----------+---------------------------------------------+------------+----------------+------------------+------------------+--------------------+---------------------+------------------+---------------------------------------------+----------------------------------------------------------------------------------------------------------+------------------+------+--------------+---------+-----------------+--------------+----------------+
| THREAD_ID | NAME                                        | TYPE       | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB     | PROCESSLIST_COMMAND | PROCESSLIST_TIME | PROCESSLIST_STATE                           | PROCESSLIST_INFO            | PARENT_THREAD_ID | ROLE | INSTRUMENTED | HISTORY | CONNECTION_TYPE | THREAD_OS_ID | RESOURCE_GROUP |
+-----------+---------------------------------------------+------------+----------------+------------------+------------------+--------------------+---------------------+------------------+---------------------------------------------+----------------------------------------------------------------------------------------------------------+------------------+------+--------------+---------+-----------------+--------------+----------------+
······
|        58 | thread/sql/event_scheduler   | FOREGROUND |          8    | NULL                | NULL                 | NULL       | Sleep         |                 NULL | Waiting on empty queue                               | NULL                                                           |                       1 | NULL | YES         | YES     | NULL            |        34147 | SYS_default    |
|        59 | thread/sql/signal_handler    | BACKGROUND |          NULL | NULL                | NULL                 | NULL      | NULL           |                NULL | NULL                                                              | NULL                                                           |                       1 | NULL | YES          | YES     | NULL            |        34148 | SYS_default    |
|        60 | thread/sql/compress_gtid_table| FOREGROUND |         10   | NULL                | NULL                 | NULL      | Daemon     |         33670997 | Suspending                                                   | NULL                                                           |                       1 | NULL | YES          | YES     | NULL            |        34149 | SYS_default    |
|        61 | thread/mysqlx/acceptor_network| BACKGROUND |         NULL | NULL               | NULL                 | NULL       | NULL          |                NULL | NULL                                                               | NULL                                                          |                     40 | NULL | YES          | YES     | NULL            |        34150 | SYS_default    |
|  46627729 | thread/mysqlx/worker          | BACKGROUND |         NULL | NULL                | NULL                 | NULL      | NULL            |               NULL | NULL                                                                | NULL                                                         |                 NULL | NULL | YES          | YES     | NULL            |        34132 | SYS_default    |
|  57810934 | thread/sql/one_connection     | FOREGROUND |     57808740 | proxy_monitor | 10.108.76.139 | NULL       | Query         |                 718 | Waiting for commit lock                                 | set global read_only=on                           |                 NULL | NULL | YES          | YES     | SSL/TLS       |        76945 | USR_default    |
|  47295992 | thread/sql/slave_io          | FOREGROUND |      47294234 | root                 | localhost            | NULL       | Connect     |         6285905 | Waiting for master to send event                 | NULL                                                           |         47294592 | NULL | YES          | YES     | NULL            |        75880 | SYS_default    |
|  47295993 | thread/sql/slave_sql         | FOREGROUND |      47294235 | root                 | localhost            | NULL       | Query         |            64768 | Waiting for dependent transaction to commit | NULL                                                         |         47294592 | NULL | YES          | YES     | NULL            |        75881 | SYS_default    |
|  47295994 | thread/sql/slave_worker      | FOREGROUND |       47294236 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for global read lock                           | INSERT INTO `dcp_deci_acctflow_info` (······ |         47295993 | NULL | YES          | YES     | NULL            |        75882 | SYS_default    |
|  47295995 | thread/sql/slave_worker    | FOREGROUND |       47294237 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for commit lock                                  | NULL                                                          |         47295993 | NULL | YES          | YES     | NULL            |        75883 | SYS_default    |
|  47295996 | thread/sql/slave_worker    | FOREGROUND |       47294238 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for preceding transaction to commit | NULL                                                          |         47295993 | NULL | YES          | YES     | NULL            |        75884 | SYS_default    |
|  47295997 | thread/sql/slave_worker    | FOREGROUND |       47294239 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for global read lock                            | INSERT INTO `dcp_deci_acctflow_info` (······ |         47295993 | NULL | YES          | YES     | NULL            |        75885 | SYS_default    |
|  47295998 | thread/sql/slave_worker    | FOREGROUND |       47294240 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for global read lock                            | INSERT INTO `dcp_deci_acctflow_info` (······ |         47295993 | NULL | YES          | YES     | NULL            |        75886 | SYS_default    |
|  47295999 | thread/sql/slave_worker    | FOREGROUND |       47294241 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for global read lock                            | INSERT INTO `dcp_deci_acctflow_info` (······ |         47295993 | NULL | YES          | YES     | NULL            |        75887 | SYS_default    |
|  47296000 | thread/sql/slave_worker    | FOREGROUND |       47294242 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75888 | SYS_default    |
|  47296001 | thread/sql/slave_worker    | FOREGROUND |       47294243 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75889 | SYS_default    |
|  47296002 | thread/sql/slave_worker    | FOREGROUND |       47294244 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75890 | SYS_default    |
|  47296003 | thread/sql/slave_worker    | FOREGROUND |       47294245 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75891 | SYS_default    |
|  47296004 | thread/sql/slave_worker    | FOREGROUND |       47294246 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75892 | SYS_default    |
|  47296005 | thread/sql/slave_worker    | FOREGROUND |       47294247 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75893| SYS_default    |
|  47296006 | thread/sql/slave_worker    | FOREGROUND |       47294248 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75894 | SYS_default    |
|  47296007 | thread/sql/slave_worker    | FOREGROUND |       47294249 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75895 | SYS_default    |
|  47296008 | thread/sql/slave_worker    | FOREGROUND |       47294250 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75896 | SYS_default    |
|  47296009 | thread/sql/slave_worker    | FOREGROUND |       47294251 | root                 | localhost            | NULL       | Query         |            67415 | Waiting for an event from Coordinator           | NULL                                                           |         47295993 | NULL | YES          | YES     | NULL            |        75897 | SYS_default    |
|  57811108 | thread/sql/one_connection  | FOREGROUND |       57808914 | proxy_monitor  | 10.108.76.140   | NULL     | Query          |              664 | Waiting for commit lock                                    | set global read_only=on                            |                 NULL | NULL | YES          | YES     | SSL/TLS        |        97442 | USR_default    |
|  57811279 | thread/sql/one_connection  | FOREGROUND |       57809085 | proxy_monitor  | 10.108.76.140   | NULL     | Query          |              610 | Waiting for commit lock                                    | set global read_only=on                            |                 NULL | NULL | YES          | YES     | SSL/TLS        |        98136 | USR_default    |
|  57811110 | thread/sql/one_connection  | FOREGROUND |       57808916 | proxy_monitor  | 10.108.76.139   | NULL     | Query          |              663 | Waiting for commit lock                                    | set global read_only=on                            |                 NULL | NULL | YES          | YES     | SSL/TLS        |        97419 | USR_default    |
······                                                                  
mysql> ^C
// 说明:由于连接接近2300条,为方便大家看起来没有那么枯燥,这里进行部分省略,大多数被省略的连接线程为set global read_only的连接

analysis

It can be observed from the above connection that there are several lock waits here, some are waiting for global read locks, and some are waiting for commit locks. First of all, the corresponding lock waiting order should be sorted out to see who is waiting for me, and Who am I waiting for".

Through the locking threads in the threads table above, it can be analyzed that there are three threads that can grasp the source:

  • Thread waiting for global read lock
  • Waiting for the thread to submit the lock
  • Threads waiting for MTS to submit in order

Let's start with the easiest global read lock to analyze, use thread 47295994 as the entry point to trace the source, and see what the lock held by thread 47295994 is? This requires the use of metadata_locks in the perfoemance_schema library, that is, we usually analyze the metadata lock table blocked by DDL for analysis,

mysql> select * from perfoemance_schema.metadata_locks where THREAD_ID = 47295994;
+-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+-------------+--------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE       | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+-------------+--------------+-----------------+----------------+
| GLOBAL     | NULL           | NULL        | NULL        |     140000636724432 | INTENTION_EXCLUSIVE | EXPLICIT | PENDING  | sql_base.cc:3002 |    47295994 |      149780492 |
+-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+-------------+--------------+-----------------+----------------+
1 rows in set (0.00 sec)

The SQL in thread 47295994 executed an insert statement, and the thread was not observed to hold the lock, but was waiting for the global read lock. We then analyzed the global read lock.

To further confirm the thread through the metadata lock table, let's see who holds the global read lock:

mysql> select * from perfoemance_schema.metadata_locks where OBJECT_TYPE = 'GLOBAL' and LOCK_STATUS = 'GRANTED';
+-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+-------------+--------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE       | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+-------------+--------------+-----------------+----------------+
| GLOBAL      | NULL          | NULL        | NULL        |       139955200762880 | SHARED    | EXPLICIT      | GRANTED     | lock.cc:1014 |        57822380 |              3 |
| GLOBAL      | NULL          | NULL        | NULL        |       140006539051472 | SHARED    | EXPLICIT      | GRANTED     | lock.cc:1014 |        57822266 |              3 |
| GLOBAL      | NULL          | NULL        | NULL        |       139957819325888 | SHARED    | EXPLICIT      | GRANTED     | lock.cc:1014 |        57821979 |              3 |
······
+-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+-------------+--------------+-----------------+----------------+
1104 rows in set (0.00 sec)

Let's pick a few records and pull them back to the threads table to see if we can find the corresponding SQL execution:

mysql> select * from perfoemance_schema.threads where THREAD_ID in (57822380,57822266,57821979);
+-----------+---------------------------------------------+------------+----------------+------------------+------------------+--------------------+---------------------+------------------+--------------------------------+---------------------+------------------+------+--------------+---------+-----------------+--------------+----------------+
| THREAD_ID | NAME                     | TYPE       | PROCESSLIST_ID | PROCESSLIST_USER | PROCESSLIST_HOST | PROCESSLIST_DB     | PROCESSLIST_COMMAND | PROCESSLIST_TIME | PROCESSLIST_STATE               | PROCESSLIST_INFO              | PARENT_THREAD_ID | ROLE | INSTRUMENTED | HISTORY | CONNECTION_TYPE | THREAD_OS_ID | RESOURCE_GROUP |
+-----------+---------------------------------------------+------------+----------------+------------------+------------------+--------------------+---------------------+------------------+--------------------------------+---------------------+------------------+------+--------------+---------+-----------------+--------------+----------------+
|  57821979 | thread/sql/one_connection   | FOREGROUND |   57808986 | proxy_monitor    | 10.108.76.140   | NULL               | Query               |              664 | Waiting for commit lock      | set global read_only=on              |             NULL | NULL | YES          | YES     | SSL/TLS         |        97442 | USR_default    |
|  57822266 | thread/sql/one_connection   | FOREGROUND |   57819228 | proxy_monitor    | 10.108.76.139   | NULL               | Query               |              679 | Waiting for commit lock      | set global read_only=on              |             NULL | NULL | YES          | YES     | SSL/TLS         |        97434 | USR_default    |
|  57822380 | thread/sql/one_connection   | FOREGROUND |   57829246 | proxy_monitor    | 10.108.76.140   | NULL               | Query               |              694 | Waiting for commit lock      | set global read_only=on              |             NULL | NULL | YES          | YES     | SSL/TLS         |        97476 | USR_default    |
+-----------+---------------------------------------------+------------+----------------+------------------+------------------+--------------------+---------------------+------------------+-----------------------------+------------------------+------------------+------+--------------+---------+-----------------+--------------+----------------+
3 rows in set (0.00 sec)

It can be confirmed by multiple queries here that the global read lock is issued by the statement set global read_only=on. From the threads table, we can also find the user proxy_monitor who issued this statement and the connection IP 10.108.76.139/140 , You can further confirm the application or client that issued the global read lock.

Through understanding, the original shell script used by the customer monitors the slave library and configures it as read-only. As long as the slave library is not read-only, the script keeps sending commands to configure it. Due to the lock blockage generated earlier, thousands of sets global The read_only connection causes a wait in MySQL.

In the threads table, we also observed a phenomenon that all threads that issue global read locks are waiting for the commit lock. Then who holds the commit lock? We then go to the metadata lock table to retrieve:

mysql> select * from perfoemance_schema.metadata_lockswhere OBJECT_TYPE = 'COMMIT' and LOCK_STATUS = 'GRANTED';
+-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+-------------+--------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE       | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+-------------+--------------+-----------------+----------------+
| COMMIT      | NULL          | NULL        | NULL        |       140011172675968 | INTENTION_EXCLUSIVE  | EXPLICIT | GRANTED | handler.cc:1697 |    47295996 |      89520430 |
| COMMIT      | NULL          | NULL        | NULL        |       140011035402992 | INTENTION_EXCLUSIVE  | EXPLICIT | GRANTED | handler.cc:1697 |    47295995 |     113911056 |
+-------------+---------------+-------------+-------------+-----------------------+-----------+---------------+-------------+--------------+-----------------+----------------+
2 rows in set (0.00 sec)

It seems that the global read lock is waiting for the commit lock in the hands of threads 47295995 and 47295996 to be released, while 47295994 is waiting for the global read lock to be released. There is still a knowledge point, that is, what is the waiting relationship between threads 47295994 and 47295995 and 47295996?

I don’t know if you have any understanding of the "Waiting for preceding transaction to commit" waiting state. It means that the transaction in the thread has to wait for the first transaction in the same transaction submission group to complete the submission first, that is, the same group transaction sequence Commit, there is a parameter in MySQL to configure the order commit from the library (slave_preserve_commit_order).

At the customer site, the issue of "transaction priority submission in connections with small thread numbers" was not verified. We will test it in the local environment later to reproduce it.

A "picture" in a nutshell

Since I hold MDL::global read lock when performing set global read_only = ON; from the library, I further want to obtain MDL::global commit lock, and the MDL_SHARED lock is displayed in LOCK_TYPE. When slave_preserve_commit_order=1, the order in the commit group of the transaction in the relay log will be committed strictly in the order in which it appears in the relay log, so the execution of the transaction and the acquisition of the two locks by the set global read_only statement are not atomic. In parallel replication mode, press Deadlock will occur in the following order:

  1. Taking threads as an example, the transactions in threads 47295994, 47295995, and 47295996 are submitted sequentially in the parallel replicated relay log. This is not observed on the spot, and evidence is given when recurring.
  2. In the slave library thread 47295996 executes faster. First, the 47295996 thread obtains the commit lock and enters the state of waiting for preceding transaction to commit, and then the thread 47295995 also obtains the commit lock.
  3. The client executes set global read_only and enters the state of waiting for commit.
  4. The insert statement of transaction a is executed after the GRL is obtained, then transaction a enters the state of waiting for global read lock, which is a phenomenon that occurs on the spot.
  5. If the insert statement of transaction a is executed before the GRL is obtained, and the transaction is submitted using Xid_event after the global commit lock is obtained, it enters the state of waiting for commit lock, which is the second deadlock, which is the second deadlock reproduced below Lock phenomenon.

solve

  • Solution 1: Kill the threads that add global read locks, but because the threads that add global read locks in the client environment are proxy_monitor, the number of threads issued is as high as thousands, which is not easy to solve, and the cost is small but it takes a long time.
  • Solution 2: You can set slave_preserve_commit_order=0 to turn off sequential submission from the binlog of the library to avoid this deadlock. Closing this parameter does not need to stop replication. It just affects the submission order of parallel replicated transactions in the slave database. It may have an impact on the final calculation and risk control business, but it will not affect the final data consistency. Therefore, if there is no special requirement from the database The binlog sequence must be consistent with the main library.
  • When using shell scripts to detect read-only from the library, you can modify the script logic: if it is detected that the read-only slave is not turned off, you can stop the replication first, and then turn on the read-only slave, so there is no need to change the consistency of the slave transaction submission And to avoid the recurrence of related problems.

The above deadlock situation may occur when the following statements are executed from the library:

set global read_only ; grant all(FLUSH PRIVILEGES) ; flush table with read lock ;

  1. The above statements will all add global read lock. After verification, this kind of deadlock has a certain probability. Only when the later transaction playback in the relay log submitted by the same group is completed and waiting for the order to be submitted will it lead to slave_preserve_commit_order The MTS replication deadlocks during execution.
  2. It is recommended to avoid business peaks when executing the above statement from the library. If it cannot be avoided, turn off the slave_preserve_commit_order parameter. This operation needs to be determined according to the business situation. It may affect the business consistency due to timing issues, but does not affect the final consistency of the data. .

Reproduce two kinds of deadlocks

Use sysbench to simulate business pressure, open a window from the library for statement input, and execute set global read_only=ON multiple times; it will get stuck and observe the output:

Kill off set global read_only=ON; After this statement, the replication returns to normal. Let's look at the situation when the connection hangs.

Deadlock 1-Customer site

View the information in the corresponding performance_schema.threads table to determine the corresponding thread submission sequence according to the transaction submission sequence, indicating that the transaction submission sequence executed by the client's site with the smaller thread number is before the execution transaction with the larger thread number. According to the transaction submission sequence, the corresponding thread submission sequence is determined, indicating that the transaction submission sequence executed by the client's site with the smaller thread number is before the execution transaction with the larger thread number.

View the lock holding relationship in the performance_schema.metadata_locks table

The locking model is the same as the deadlock closed loop in the above analysis.

According to the transaction submission sequence, the corresponding thread submission sequence is determined, indicating that the transaction submission sequence executed by the client's site with the smaller thread number is before the execution transaction with the larger thread number.

deadlock 2

View the information in the corresponding performance_schema.threads table

Check the locks in the performance_schema.metadata_locks table

Draw the corresponding deadlock closed loop according to the lock relationship

In addition to the necessary locks, the transaction will additionally request the MDL_INTENTION_EXCLUSIVE lock of the MDL_key::GLOBAL lock; before the transaction is committed, it will first request the MDL_INTENTION_EXCLUSIVE lock of the MDL_key::COMMIT lock. This lock is an S lock. For range locks, an IX lock It is not compatible with S lock.

According to the commit order of the transaction, determine the corresponding thread commit order

reference:

https://sq.163yun.com/blog/article/211685211777150976
http://dbaplus.cn/news-11-1874-1.html
http://mysql.taobao.org/monthly/2018/02/01/
Bug ticket: https://bugs.mysql.com/bug.php?id=95863


爱可生开源社区
426 声望211 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。