Author: Lei Wenting

DBA member of Acsen East China Delivery Service Department, mainly responsible for Mysql troubleshooting and related technical support. I like to read books and movies. The motto is that every day without dancing is a disappointment to life.

Source of this article: original contribution

*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.


content

  • Background Information
  • Failure analysis
  • Problem recurrence
  • test log
  • Test conclusion
  • Reference link

Background Information

In MySQL 5.7.30 master-slave read-write separation environment, a MySQL crash occurred in the slave library one day.

System side: Monitoring shows that the memory and CPU resource usage of the slave host is normal before and after the failure, and the disk IO has an iowait of 2% (read and write 200M/s), indicating that the disk was under pressure before the failure.

Service side: The slow-log records that there are slow SQL statements using temporary tables and file sorting before the service is restarted.

The error-log records the service call to the L2165 line of the btr0btr.cc file, and an err == DB_SUCCESS error occurs.

 0x7f2dd49d0700  InnoDB: Assertion failure in thread 139834817316608 in file btr0btr.cc line 2165
InnoDB: Failing assertion: err == DB_SUCCESS
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.

Failure analysis

On the system side, the factors of insufficient disk space and insufficient memory were ruled out, and the server side suspected that it was the cause of slow queries and bugs. After that, a similar bug was found through the "btr0btr.cc" keyword. The link is as follows:

https://bugs.mysql.com/bug.php?id=101154

What the report means is that when MySQL executes the btr_insert_on_non_leaf_level_func() function, writing to a temporary table causes the service with the assertion to crash.

The reason I learned by looking at the comments at the beginning of the btr0btr.cc file is:

The calling mechanism for this file is: all changes made to b-tree row operations or records.

The content of the L2165 line operation is: when processing the content inserted into the non-leaf level, the free space of each level will be checked (the page space of twice the index height needs to be reserved). If the leaf split has started before the operation, it is very difficult to It is difficult to undo, and can only be rolled forward through a crash. The bug will only appear in MySQL 5.7

Code query: https://github.com/mysql/mysql-server (select the corresponding version through the Tags tag)

Code content: https://github.com/mysql/mysql-server/blob/mysql-5.7.30/storage/innobase/btr/btr0btr.cc#L2165

 nserts a data tuple to a tree on a non-leaf level. It is assumed
that mtr holds an x-latch on the tree. */
void
btr_insert_on_non_leaf_level_func()
    ---
    ut_a(err == DB_SUCCESS);
    ---
        
其中ut_a()
https://dev.mysql.com/doc/dev/mysql-server/latest/ut0dbg_8h.html#ae7aed983dfe98ac872b5a9915fa778fa:


Check the database parameters about temporary tables:

innodb_temp_data_file_path ibtmp1:12M:autoextend:max:20G

tmp_table_size 64M and max_heap_table_size 64M

Note: The actual limit is the smaller of the two. Session-level parameters, for the case where the innodb_buffer_pool_size is not large and the temporary large data volume query is not used, it is not recommended to set too large, which may lead to memory overflow. The number of connections is 800+, 64M is the recommended value

internal_tmp_disk_storage_engine InnoDB

Comments: The storage engine used to define disk temporary tables. Queries to on-disk temporary tables that exceed the InnoDB row or column limit return Row size too large or Too many columns errors. The solution is to set internal_tmp_disk_storage_engine to MYISAM, there is no related error in our error-log. It is not recommended to modify the preliminary investigation stage

created_tmp_disk_tables 2987733

created_tmp_tables 11049848

NOTE: The server increments this Created_tmp_tables value when creating internal temporary tables in memory or on disk. The server increments this Created_tmp_disk_tables value when creating an internal temporary table on disk. If too many internal temporary tables are created on disk, consider increasing the tmp_table_size and max_heap_table_size settings. From 10:36 to 17:00 in the morning, more temporary tables are generated, which is a normal phenomenon in combination with the busy business situation.

Summary: Through the above analysis, combined with the application architecture (cannot upgrade to MySQL8.0). In the preliminary stage, it is recommended to optimize the SQL statement first , reduce the use of temporary tables, and reduce the probability of recurrence.

Problem recurrence

In the BUG report mentioned above, there is a test case using MySQL Test Run referred to as MTR (MySQL's official automated testing framework).

This test case will be referenced below for reproduction testing.

Parameter explanation:

The innodb_limit_optimistic_insert_debug parameter can limit the number of records per B-tree page. Dynamically setting this parameter during MySQL operation can cause page splits.

 innodb_limit_optimistic_insert_debug
限制每个 B 树页面的记录数。默认值 0 表示不施加限制。仅当使用 CMake选项编译调试支持时,需开启DEBUG选项。

# 依赖
yum install -y gcc gcc-c++ cmake ncurses ncurses-devel bison openssl openssl-devel
tar -xvf mysql-boost-5.7.30.tar.gz

--编译安装MySQL,因为需要设置innodb_limit_optimistic_insert_debug参数-------
tar -xvf mysql-boost-5.7.30.tar.gz

# 非BOOST版本的Mysql源码包,需要指定-DDOWNLOAD_BOOST=1 -DWITH_BOOST=/usr/local/boost
cd mysql-5.7.30
cmake . -DCMAKE_INSTALL_PREFIX=/tools/mysql-test5.7.30 -DMYSQL_DATADIR=/tools/mysql-test5.7.30/data -DMYSQL_UNIX_ADDR=/tools/mysql-test5.7.30/mysql5.7.30.sock -DDEFAULT_CHARSET=utf8mb4 -DDEFAULT_COLLATION=utf8mb4_general_ci -DEXTRA_CHARSETS=all -DENABLED_LOCAL_INFILE=1 -DWITH_SSL=system  -DWITH_BOOST=boost -DWITH_DEBUG=1 

make
make install

# Cmake 编译之后会在DCMAKE_INSTALL_PREFIX目录中生成mysql-test测试框架目录
/tools/mysql-test5.7.30/mysql-test/t
vim my0420.test 
cat my0420.test 
--source include/have_innodb.inc

--let $restart_parameters = "restart: --innodb_temp_data_file_path=ibtmp1:12M:autoextend:max:1G --big-tables=1"
--source include/restart_mysqld.inc

SELECT @@innodb_temp_data_file_path;

drop function if exists func1;
delimiter |;
create function func1(x int) returns int deterministic
        begin
                declare z1, z2 int;
                set z1 = x;
                set z2 = z1 + 2;
                return z2;
        end|
delimiter ;|

create table t1 (a int, b varchar(20));
insert into t1 values(1, 'a'), (2, 'b'), (3, 'c'), (3, 'c'), (4, 'c');

SET GLOBAL innodb_limit_optimistic_insert_debug=4;
--let $i=1
while ($i <= 15) {
INSERT INTO t1 SELECT * FROM t1;
--inc $i
}

SELECT COUNT(*) FROM t1;

SET GLOBAL innodb_limit_optimistic_insert_debug=2;
select * from t1 order by func1(a);

进行测试,
/tools/mysql-test5.7.30/mysql-test
./mtr my0420.test

其中 select * from t1 order by func1(a); 会使用Using temporary; Using filesort 和 业务SQL的执行计划一致

After adding the --internal_tmp_disk_storage_engine=MYISAM parameter to the second line of my0420.test, the service does not crash.

--let $restart_parameters = "restart: --innodb_temp_data_file_path=ibtmp1:12M:autoextend:max:1G --internal_tmp_disk_storage_engine=MYISAM --big-tables=1"

After increasing the second line of my0420.test to the ---innodb_temp_data_file_path=ibtmp1:12M:autoextend:max:6G parameter, the service does not crash.

--let $restart_parameters = "restart: --innodb_temp_data_file_path=ibtmp1:12M:autoextend:max:6G --big-tables=1"

big_tables If enabled, the server stores all temporary tables on disk instead of in memory. This prevents most errors for operations that require large temporary tables, but also slows down queries where in-memory tables are sufficient. most The table * tbl_name * is full errors for SELECT operations, if this error appears in the error-log, it is recommended to use a large temporary table for select operation .

( Small reminder , in the customer environment, you will often receive a warning email of a temporary table #sql_tbl_name is full, you need to consider whether the SQL can be optimized)

test log

The execution logic of MTR is to start a temporary MySQL service and execute the contents of the my0420.test file in the t directory. By default, the execution result will be the same name as the result file in the r directory (also called the standard execution result file, generally in the correct version). generated) for comparison to determine whether the test file is correct.

 在mysql-test目录下:
./mtr my0420.test  
--执行到以下语句时报错--
SET GLOBAL innodb_limit_optimistic_insert_debug=2;
[100%] main.my0420                              [ fail ]
        Test ended at 2022-04-20 20:05:39

CURRENT_TEST: main.my0420
mysqltest: At line 32: query 'select * from t1 order by func1(a)' failed: 2013: Lost connection to MySQL server during query
safe_process[7080]: Child process: 7081, exit: 1

Server [mysqld.1 - pid: 7089, winpid: 7089, exit: 256] failed during test run
Server log from this test:

Lost connection , the MySQL service has stopped. After that the error message is printed. The error also appears in btr0btr.cc line 2165

 Server log from this test:
----------SERVER LOG START-----------
2022-04-20T12:02:42.082135Z 0 [Note] /tools/mysql-test5.7.30/bin/mysqld (mysqld 5.7.30-debug-log) starting as process 7049 ...
2022-04-20T12:02:43.698812Z 0 [Note] /tools/mysql-test5.7.30/bin/mysqld: Shutdown complete
2022-04-20T12:02:45.051667Z 0 [Note] /tools/mysql-test5.7.30/bin/mysqld (mysqld 5.7.30-debug-log) starting as process 7090 ...
2022-04-20T12:02:45.262573Z 0 [Note] /tools/mysql-test5.7.30/bin/mysqld: ready for connections.
Version: '5.7.30-debug-log'  socket: '/tools/mysql-test5.7.30/mysql-test/var/tmp/mysqld.1.sock'  port: 13000  Source distribution
2022-04-20 15:05:37 0x7fc298bc4700  InnoDB: Assertion failure in thread 140473762858752 in file btr0btr.cc line 2165
InnoDB: Failing assertion: err == DB_SUCCESS
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
12:05:37 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61093 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fc234005890): select * from t1 order by func1(a)
Connection ID (thread ID): 2
Status: NOT_KILLED

Differences from the customer environment:

The error in production is Query(7f3be00479d0) is an invalid pointer. Invalid pointer, similar to the error of insufficient disk space.

The error in the test is Query (7fc234005890): select * from t1 order by func1(a).

Test environment stack information:

https://github.com/mysql/mysql-server/blob/mysql-5.7.30/storage/innobase/btr/btr0btr.cc#L2285

(The Chinese in the picture above is a translated code comment, there may be some deviation errors)

Check the test log file error-log , the var directory will be generated in the current directory by default, which contains the my.cnf file

/tools/mysql-test5.7.30/mysql-test/var/log/mysqld.1.err

 Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fc234005890): select * from t1 order by func1(a)
Connection ID (thread ID): 2
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
safe_process[7089]: Child process: 7090, killed by signal: 6

Supplement for compiling and installing MySQL:

After Cmake is compiled, the mysql-test test framework directory will be generated in the DCMAKE_INSTALL_PREFIX directory.

The following steps are not required (the following steps are used when DEBUG debugging is not required)

1. This package is mysql-test-5.7.30-linux-glibc2.12-x86_64.tar.gz

mv mysql-5.7.30-linux-glibc2.12-x86_64/ /tools/mysql-test

2. Merge the files in the mysql installation package directory with the mtr directory, and the mysql installation package directory is basedir

cp -r /data/mysql/base/5.7.30 /tools/mysql-test

(Insert life diary, the make command takes about 1 hour to execute, just at 18:27, and it coincides with the quarantine, so I got up to wash the pot and fry the kitchen)

Test conclusion

This BUG may appear in MySQL 5.7 version

1. In the test, it has been verified that the database parameter innodb_temp_data_file_path will not crash after the max_file_size is increased. If the business SQL cannot be optimized, this parameter can be increased to reduce the probability of triggering a crash.

2. In the test, it is verified that the database parameter internal_tmp_disk_storage_engine=MYISAM will not crash, the default INNODB

If the business cannot be upgraded to 8.0, this parameter can be dynamically adjusted.

Our suggested sequence of changes is:

Optimize SQL statement -> increase max_file_size value of innodb_temp_data_file_path parameter -> upgrade to MySQL 8.0 (use session temporary tablespace) -> modify internal_tmp_disk_storage_engine parameter.

Among them, the internal_tmp_disk_storage_engine parameter, I do not quite understand, whether it is really necessary to change the default value of INNODB to MYISAM. Later, I asked my colleagues to learn that "the internal temporary table will not be copied, and there will be no concurrent access. You can consider using MYISAM"

Thanks again.

Reference link

BUG report:

https://bugs.mysql.com/bug.php?id=101154
https://jira.percona.com/browse/PS-7318?focusedCommentId=268045&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-268045

Compilation and installation and parameter description:

https://blog.csdn.net/iteye_621/article/details/81959655
https://baijiahao.baidu.com/s?id=1725289345179642059&wfr=spider&for=pc

https://dev.mysql.com/doc/refman/5.7/en/source-configuration-options.html#cmake-general-options

MTR documentation:

https://dev.mysql.com/doc/dev/mysql-server/latest/PAGE_MYSQLTEST_FRAMEWORK_COMPONENTS.html

Database parameters:

https://dev.mysql.com/doc/refman/5.7/en/server-status-variables.html#statvar_Created_tmp_tables

https://dev.mysql.com/doc/refman/5.7/en/server-status-variables.html#statvar_Created_tmp_disk_tables

https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_tmp_table_size

https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_max_heap_table_size

https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_big_tables

https://dev.mysql.com/doc/refman/5.7/en/select.html


爱可生开源社区
426 声望209 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。


引用和评论

0 条评论