在这篇文章中我会分享一个在 MySQL 8.0.35 版本中修复的一个宕机 bug,以及怎样通过错误日志、corefile 和 gdb 发现的这个 bug。

作者:马金友, 一名给 MySQL 找 bug 的初级 DBA。

爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。

本文约 4000 字,预计阅读需要 15 分钟。

在这篇文章中我会分享一个在 MySQL 8.35 版本中修复的一个宕机 bug,以及怎样通过错误日志、corefile 和 gdb 怎样发现这个bug。

背景介绍

一个客户的数据库(MySQL 8.0.27)随机性的崩溃。

通过错误日志我们可以看到是: 由一个 SELECT 查询导致的 Assertion failure

  • query

    Query (7f20ec0f4a78): SELECT a FROM t WHERE ('session' MEMBER OF (a) )
  • assertion failure

    2023-12-15T22:57:12.408886Z 8 [ERROR] [MY-013183] [InnoDB] Assertion failure: row0mysql.cc:165:len < 256 * 256 thread 139781058020928
  • 表结构

    CREATE TABLE `t` (
    `id` int,
    `a` json DEFAULT NULL,
    KEY `a` ((cast(`a` as char(255) array)))
    ) ENGINE=InnoDB
    PARTITION BY HASH(id)
    PARTITIONS 1;

错误日志的分析

  • 错误日志

    2023-12-15T22:57:12.408886Z 8 [ERROR] [MY-013183] [InnoDB] Assertion failure: row0mysql.cc:165:len < 256 * 256 thread 139781058020928
    InnoDB: We intentionally generate a memory trap.
    InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
    InnoDB: If you get repeated assertion failures or crashes, even
    InnoDB: immediately after the mysqld startup, there may be
    InnoDB: corruption in the InnoDB tablespace. Please refer to
    InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
    InnoDB: about forcing recovery.
    22:57:12 UTC - mysqld got signal 6 ;
    Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
    Thread pointer: 0x7f20ec000ed0
    Attempting backtrace. You can use the following information to find out
    where mysqld died. If you see no messages after this, something went
    terribly wrong...
    stack_bottom = 7f21504f3d00 thread_stack 0x100000
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x2e) [0x1f7f4ae]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(handle_fatal_signal+0x323) [0x1029073]
    /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f2167fee520]
    /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f21680429fc]
    /lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f2167fee476]
    /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3) [0x7f2167fd47f3]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld() [0xd7f2bb]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld() [0x21702c8]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(row_sel_field_store_in_mysql_format_func(unsigned char*, mysql_row_templ_t const*, dict_index_t const*, unsigned char const*, unsigned long)+0x1a0) [0x219bf30]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(row_sel_store_mysql_rec(unsigned char*, row_prebuilt_t*, unsigned char const*, dtuple_t const*, bool, dict_index_t const*, dict_index_t const*, unsigned long const*, bool, lob::undo_vers_t*, mem_block_info_t*&)+0x1bc) [0x21a089c]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long)+0x37ed) [0x21a86bd]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+0x331) [0x20609c1]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(ha_innopart::index_read_map_in_part(unsigned int, unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x57) [0x207a7a7]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(Partition_helper::handle_unordered_scan_next_partition(unsigned char*)+0x11b) [0xdfef5b]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function)+0x2e1) [0x11310e1]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(RefIterator<false>::Read()+0xc1) [0xea4031]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(FilterIterator::Read()+0x14) [0x13ab924]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(Query_expression::ExecuteIteratorQuery(THD*)+0x3c3) [0xfaed93]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(Query_expression::execute(THD*)+0x2c) [0xfaef7c]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(Sql_cmd_dml::execute(THD*)+0x2c5) [0xf3f855]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(mysql_execute_command(THD*, bool)+0xac8) [0xee1e08]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x403) [0xee68f3]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x2089) [0xee8e49]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld(do_command(THD*)+0x174) [0xee9d04]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld() [0x1019fa8]
    /home/ubuntu/opt/mysql/8.0.27/bin/mysqld() [0x249dcec]
    /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f2168040ac3]
    /lib/x86_64-linux-gnu/libc.so.6(+0x126a40) [0x7f21680d2a40]
    ubuntu
    Trying to get some variables.
    Some pointers may be invalid and cause the dump to abort.
    Query (7f20ec0f4a78): SELECT a FROM t WHERE ('session' MEMBER OF (a) )
    Connection ID (thread ID): 8
    Status: NOT_KILLED

① 源码中的 Assertion failure

断言在文件 row0mysql.cc 中第 165 行。

2023-12-15T22:57:12.408886Z 8 [ERROR] [MY-013183] [InnoDB] Assertion failure: row0mysql.cc:165:len < 256 * 256 thread 139781058020928

当我们打开文件 ./innobase/row/row0mysql.cc,我们可以找到对应的 ut_a(len < 256 * 256); 在函数 row_mysql_store_true_var_len 中。

/** Stores a >= 5.0.3 format true VARCHAR length to dest, in the MySQL row
 format.
 @return pointer to the data, we skip the 1 or 2 bytes at the start
 that are used to store the len */
byte *row_mysql_store_true_var_len(
    byte *dest,   /*!< in: where to store */
    ulint len,    /*!< in: length, must fit in two bytes */
    ulint lenlen) /*!< in: storage length of len: either 1 or 2 bytes */
{
  if (lenlen == 2) {
    ut_a(len < 256 * 256);

    mach_write_to_2_little_endian(dest, len);

    return (dest + 2);
  }

  ut_a(lenlen == 1);
  ut_a(len < 256);

  mach_write_to_1(dest, len);

  return (dest + 1);
}

疑问:InnoDB 为什么会判断这个 len 要小于 256 * 256 呢?

通过注释我们可以看到,这是和 VARCHAR 长度相关。如果您对 MySQL 相对熟悉的话,应该能联想到 VARCHAR 的最大长度(当然我在分析问题的时候没有想到, 同事提示说应该是长度的问题)。

官网 上有一句话:

The length can be specified as a value from 0 to 65,535. The effective maximum length of a VARCHAR is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used

重点是值的范围是 0 - 65,535,而 256 * 256 是 65,536。

这里可以理解为什么有了这个断言:因为 InnoDB 需要去确认所有的长度必须小于 65,536!如果不是,那么 InnoDB 只能自杀,进而导致宕机。

② 断言对应的宏 ut_a

EXPRut_a 宏中不是 TURE,那么就调用 ut_dbg_assertion_failed 去自杀。

/** Abort execution if EXPR does not evaluate to nonzero.
@param EXPR assertion expression that should hold */
#define ut_a(EXPR)                                               \
  do {                                                           \
    if (UNIV_UNLIKELY(!(ulint)(EXPR))) {                         \
      ut_dbg_assertion_failed(#EXPR, __FILE__, (ulint)__LINE__); \
    }                                                            \
  } while (0)

在我们这个案例中,len < 256 * 256 为 FALSE。

③ 调用 ut_dbg_assertion_failed

ut_dbg_assertion_failed,InnoDB 会打印一些宕机信息(也就是我们在错误日志中看到的)。在最后的阶段,调用 abort() 去结束进程。

/** Report a failed assertion.
@param[in] expr The failed assertion
@param[in] file Source file containing the assertion
@param[in] line Line number of the assertion */
[[noreturn]] void ut_dbg_assertion_failed(const char *expr, const char *file,
                                          ulint line) {
#if !defined(UNIV_HOTBACKUP) && !defined(UNIV_NO_ERR_MSGS)
  ib::error(ER_IB_MSG_1273)
      << "Assertion failure: " << innobase_basename(file) << ":" << line
      << ((expr != nullptr) ? ":" : "") << ((expr != nullptr) ? expr : "")
      << " thread " << to_string(std::this_thread::get_id());

  flush_error_log_messages();

#else  /* !UNIV_HOTBACKUP && !defined(UNIV_NO_ERR_MSGS) */
  auto filename = base_name(file);

  if (filename == nullptr) {
    filename = "null";
  }

  fprintf(stderr,
          "InnoDB: Assertion failure: %s:" ULINTPF
          "%s%s\n"
          "InnoDB: thread %s",
          filename, line, expr != nullptr ? ":" : "",
          expr != nullptr ? expr : "",
          to_string(std::this_thread::get_id()).c_str());
#endif /* !UNIV_HOTBACKUP */

  fputs(
      "InnoDB: We intentionally generate a memory trap.\n"
      "InnoDB: Submit a detailed bug report"
      " to http://bugs.mysql.com.\n"
      "InnoDB: If you get repeated assertion failures"
      " or crashes, even\n"
      "InnoDB: immediately after the mysqld startup, there may be\n"
      "InnoDB: corruption in the InnoDB tablespace. Please refer to\n"
      "InnoDB: " REFMAN
      "forcing-innodb-recovery.html\n"
      "InnoDB: about forcing recovery.\n",
      stderr);

  fflush(stderr);
  fflush(stdout);
  /* Call any registered callback function. */
  if (assert_callback) {
    assert_callback();
  }
  abort();
}

④ 调用 abort

这是在 man 手册中看到的关于 abort() 的介绍:

NAME
       abort - cause abnormal process termination

SYNOPSIS
       #include <stdlib.h>

       void abort(void);

DESCRIPTION
       The  abort() function first unblocks the SIGABRT signal, and then raises that signal for the calling process (as though raise(3) was called).  This results
       in the abnormal termination of the process unless the SIGABRT signal is caught and the signal handler does not return (see longjmp(3)).

       If the SIGABRT signal is ignored, or caught by a handler that returns, the abort() function will still terminate the process.  It does  this  by  restoring
       the default disposition for SIGABRT and then raising the signal for a second time.

SIGABRT 信号对应的值为 6

shell> kill -l SIGABRT
6

⑤ signal 6

这也是为什么我们在错误日志中会看到 signal 6

22:57:12 UTC - mysqld got signal 6 ;

⑥ backtrace

通过查询这个函数名 row_mysql_store_true_var_len,我们看到有两个地方调用它。

  • row_sel_field_store_in_mysql_format_func
  • innobase_col_to_mysql
shell> grep -nr row_mysql_store_true_var_len .
./innobase/row/row0mysql.cc:159:byte *row_mysql_store_true_var_len(
./innobase/row/row0sel.cc:2605:            row_mysql_store_true_var_len(dest, len, templ->mysql_length_bytes);
./innobase/handler/handler0alter.cc:1935:            row_mysql_store_true_var_len(dest, len, flen - field->key_length());
./innobase/include/row0mysql.h:102:byte *row_mysql_store_true_var_len(

我们可以看到 row_sel_field_store_in_mysql_format_func 在错误日志中:

/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3) [0x7f2167fd47f3]
/home/ubuntu/opt/mysql/8.0.27/bin/mysqld() [0xd7f2bb]
/home/ubuntu/opt/mysql/8.0.27/bin/mysqld() [0x21702c8]
/home/ubuntu/opt/mysql/8.0.27/bin/mysqld(row_sel_field_store_in_mysql_format_func(unsigned char*, mysql_row_templ_t const*, dict_index_t const*, unsigned char const*, unsigned long)+0x1a0) [0x219bf30]

这个函数调用了一个 relative address (0x21702c8),命令 addr2line 可以转换这个 relative address (0x21702c8) 到对应的函数名。

shell> addr2line -fie  ~/opt/mysql/8.0.27/bin/mysqld  0x21702c8
_Z28row_mysql_store_true_var_lenPhmm
??:?

函数的调用关系为:

row_sel_field_store_in_mysql_format_func
  -> row_mysql_store_true_var_len
    -> ut_ad (ut_dbg_assertion_failed)
      -> abort

corefile 的分析

因为我们目前通过错误日志拿到的信息是有限的 (无法获取 len 的具体值),我们建议客户打开 corefile

这里补充一个 MariaDB 关于怎样开启 corefile 的文档

① gdb 分析 corefile

客户随后将 corefile 发给了我们。

shell> gdb -ex "set print pretty on" -ex "set pagination off" -ex "set debuginfod enabled on" -ex "set print thread-events off" -ex "file /home/ubuntu/opt/mysql/8.0.27/bin/mysqld" -ex "core /home/ubuntu/corefiles/core.8027"

Core was generated by `/home/ubuntu/opt/mysql/8.0.27/bin/mysqld --defaults-file=/home/ubuntu/sandbox'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=3, signo=6, threadid=140132373239360) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7f731c541640 (LWP 1534610))]
客户的 MySQL 是通过二进制包安装的,所以 mysqld 中自带了 debuginfo

② 具体的 backtrace

(gdb) bt
#0  __pthread_kill_implementation (no_tid=3, signo=6, threadid=140132373239360) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_esrch (threadid=140132373239360, signo=signo@entry=6) at ./nptl/pthread_kill.c:108
#2  0x0000000001f7e9e7 in my_write_core (sig=sig@entry=6) at ../../mysql-8.0.27/mysys/stacktrace.cc:294
#3  0x000000000102909d in handle_fatal_signal (sig=6) at ../../mysql-8.0.27/sql/signal_handler.cc:171
#4  <signal handler called>
#5  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140132373239360) at ./nptl/pthread_kill.c:44
#6  __pthread_kill_internal (signo=6, threadid=140132373239360) at ./nptl/pthread_kill.c:78
#7  __GI___pthread_kill (threadid=140132373239360, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#8  0x00007f7347bee476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#9  0x00007f7347bd47f3 in __GI_abort () at ./stdlib/abort.c:79
#10 0x0000000000d7f2bb in ut_dbg_assertion_failed (expr=expr@entry=0x279263f "len < 256 * 256", file=file@entry=0x2f55fa0 "../../../mysql-8.0.27/storage/innobase/row/row0mysql.cc", line=line@entry=165) at ../../../mysql-8.0.27/storage/innobase/ut/ut0dbg.cc:99
#11 0x00000000021702c8 in row_mysql_store_true_var_len (dest=<optimised out>, len=len@entry=4294967292, lenlen=<optimised out>) at ../../../mysql-8.0.27/storage/innobase/row/row0mysql.cc:165
#12 0x000000000219bf30 in row_sel_field_store_in_mysql_format_func (dest=<optimised out>, templ=0x7f72c80ecae0, index=<optimised out>, data=0x7f72c8191be8 "\020\034\031\310r\177", len=4294967292) at ../../../mysql-8.0.27/storage/innobase/row/row0sel.cc:2605
#13 0x00000000021a089c in row_sel_store_mysql_rec (mysql_rec=0x7f72c80da818 "\370\001", prebuilt=0x7f72c8194938, rec=0x7f72c817c39f "", vrow=0x7f72c8191b28, rec_clust=true, rec_index=0x7f72c8148878, prebuilt_index=0x7f72c80d93a8, offsets=0x7f731c53d940, clust_templ_for_sec=false, lob_undo=0x7f72c8194b58, blob_heap=@0x7f72c8194a98: 0x7f72c8aa5c30) at ../../../mysql-8.0.27/storage/innobase/row/row0sel.cc:3013
#14 0x00000000021a86bd in row_search_mvcc (buf=buf@entry=0x7f72c80da818 "\370\001", mode=mode@entry=PAGE_CUR_GE, prebuilt=0x7f72c8194938, match_mode=match_mode@entry=1, direction=direction@entry=0) at ../../../mysql-8.0.27/storage/innobase/include/row0mysql.h:875
#15 0x00000000020609c1 in ha_innobase::index_read (this=0x7f72c8189938, buf=0x7f72c80da818 "\370\001", key_ptr=<optimised out>, key_len=<optimised out>, find_flag=<optimised out>) at ../../../mysql-8.0.27/storage/innobase/handler/ha_innodb.cc:10069
#16 0x000000000207a7a7 in handler::index_read_map (find_flag=HA_READ_KEY_EXACT, keypart_map=1, key=0x7f72c816f6c0 "", buf=0x7f72c80da818 "\370\001", this=0x7f72c8189938) at ../../../mysql-8.0.27/sql/handler.h:5078
#17 ha_innopart::index_read_map_in_part (this=0x7f72c8189938, part=0, record=0x7f72c80da818 "\370\001", key=0x7f72c816f6c0 "", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at ../../../mysql-8.0.27/storage/innobase/handler/ha_innopart.cc:1894
#18 0x0000000000dfef5b in Partition_helper::handle_unordered_scan_next_partition (this=0x7f72c818afc8, buf=0x7f72c80da818 "\370\001") at ../../mysql-8.0.27/sql/partitioning/partition_handler.cc:2364
#19 0x00000000011310e1 in handler::ha_index_read_map (this=0x7f72c8189938, buf=0x7f72c80da818 "\370\001", key=0x7f72c816f6c0 "", keypart_map=1, find_flag=find_flag@entry=HA_READ_KEY_EXACT) at ../../mysql-8.0.27/sql/handler.cc:3253
#20 0x0000000000ea4031 in RefIterator<false>::Read (this=0x7f72c8170398) at ../../mysql-8.0.27/sql/row_iterator.h:208
#21 0x00000000013ab924 in FilterIterator::Read (this=0x7f72c81703d8) at ../../mysql-8.0.27/sql/composite_iterators.cc:72
#22 FilterIterator::Read (this=0x7f72c81703d8) at ../../mysql-8.0.27/sql/composite_iterators.cc:70
#23 0x0000000000faed93 in Query_expression::ExecuteIteratorQuery (this=0x7f72c818dd18, thd=0x7f72c8000ed0) at ../../mysql-8.0.27/sql/sql_union.cc:1300
#24 0x0000000000faef7c in Query_expression::execute (this=<optimised out>, thd=0x7f72c8000ed0) at ../../mysql-8.0.27/sql/sql_union.cc:1353
#25 0x0000000000f3f855 in Sql_cmd_dml::execute (this=0x7f72c8190390, thd=0x7f72c8000ed0) at ../../mysql-8.0.27/sql/sql_select.cc:579
#26 0x0000000000ee1e08 in mysql_execute_command (thd=0x7f72c8000ed0, first_level=<optimised out>) at ../../mysql-8.0.27/sql/sql_parse.cc:4505
#27 0x0000000000ee68f3 in dispatch_sql_command (thd=thd@entry=0x7f72c8000ed0, parser_state=parser_state@entry=0x7f731c53ff10) at ../../mysql-8.0.27/sql/sql_parse.cc:5135
#28 0x0000000000ee8e49 in dispatch_command (thd=0x7f72c8000ed0, com_data=<optimised out>, command=COM_QUERY) at ../../mysql-8.0.27/sql/sql_parse.cc:1922
#29 0x0000000000ee9d04 in do_command (thd=thd@entry=0x7f72c8000ed0) at ../../mysql-8.0.27/sql/sql_parse.cc:1343
#30 0x0000000001019fa8 in handle_connection (arg=arg@entry=0x703d4b0) at ../../mysql-8.0.27/sql/conn_handler/connection_handler_per_thread.cc:302
#31 0x000000000249dcec in pfs_spawn_thread (arg=0x71008f0) at ../../../mysql-8.0.27/storage/perfschema/pfs.cc:2946
#32 0x00007f7347c40ac3 in start_thread (arg=<optimised out>) at ./nptl/pthread_create.c:442
#33 0x00007f7347cd2a40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

我们可以找到 row_mysql_store_true_var_len 在 frame 11:

(gdb) frame 11
#11 0x00000000021702c8 in row_mysql_store_true_var_len (dest=<optimised out>, len=len@entry=4294967292, lenlen=<optimised out>) at ../../../mysql-8.0.27/storage/innoba
se/row/row0mysql.cc:165
165     ../../../mysql-8.0.27/storage/innobase/row/row0mysql.cc: No such file or directory.

③ 正确显示源码

  • 下载源码

    shell> git clone https://github.com/mysql/mysql-server.git
    shell> cd mysql-server/
    shell> git tag --list | grep 8.0.27
    mysql-8.0.27
    mysql-cluster-8.0.27
    shell> git checkout mysql-8.0.27

    我们需要切换到 MySQL 8.0.27 这个分支中。

  • 在 gdb 中映射目录

    (gdb) set substitute-path ../../../mysql-8.0.27 /home/ubuntu/mysql-server

④ 分析 corefile

len 的值是多少?

row_mysql_store_true_var_len 中,len 是 4294967292,这明显的大于了 限制范围 65,535!

(gdb) frame 11
#11 0x00000000021702c8 in row_mysql_store_true_var_len (dest=<optimised out>, len=len@entry=4294967292, lenlen=<optimised out>) at ../../../mysql-8.0.27/storage/innobase/row/row0mysql.cc:165
165         ut_a(len < 256 * 256);
(gdb) p len
$2 = 4294967292
(gdb) p len  < 256 * 256
$3 = false

len 来自哪里?

在 frame 13 中我们看到 len 来自于 dfield->len,但是 dfield 已经被优化掉了。

(gdb) frame 13
#13 0x00000000021a089c in row_sel_store_mysql_rec (mysql_rec=0x7f72c80da818 "\370\001", prebuilt=0x7f72c8194938, rec=0x7f72c817c39f "", vrow=0x7f72c8191b28, rec_clust=true, rec_index=0x7f72c8148878, prebuilt_index=0x7f72c80d93a8, offsets=0x7f731c53d940, clust_templ_for_sec=false, lob_undo=0x7f72c8194b58, blob_heap=@0x7f72c8194a98: 0x7f72c8aa5c30) at ../../../mysql-8.0.27/storage/innobase/row/row0sel.cc:3013
(gdb) list
3008                (byte)templ->mysql_null_bit_mask;
3009            memcpy(mysql_rec + templ->mysql_col_offset,
3010                   (const byte *)prebuilt->default_rec + templ->mysql_col_offset,
3011                   templ->mysql_col_len);
3012          } else {
3013            row_sel_field_store_in_mysql_format(
3014                mysql_rec + templ->mysql_col_offset, templ, rec_index,
3015                templ->clust_rec_field_no, (const byte *)dfield->data, dfield->len,
3016                ULINT_UNDEFINED);
3017            if (templ->mysql_null_bit_mask) {
(gdb) p dfield
$7 = <optimised out>

dfield 又在哪里?

原来在 frame 13 的 row_sel_store_mysql_rec 中,我们找到了。

      const auto dfield = dtuple_get_nth_v_field(vrow, col->v_pos);

这个 dfield 是在 dtuple_get_nth_v_field 中通过 vrowcol->v_pos 计算出来的.。我们能找到 vrow,但同样 col 被优化掉了。

(gdb) p vrow
$8 = (const dtuple_t *) 0x7f72c8191b28
(gdb) p col
$9 = <optimised out>

col 是多少?

上面的 colthe dict_table_get_nth_v_col 通过 rec_index->tabletempl->clust_rec_field_no 计算得出。

      dict_v_col_t *col;
      col =
          dict_table_get_nth_v_col(rec_index->table, templ->clust_rec_field_no);

我们看下 dict_table_get_nth_v_col 中的具体内容:

/** Gets the nth virtual column of a table.
@param[in]    table    table
@param[in]    pos    position of virtual column
@return pointer to virtual column object */
static inline dict_v_col_t *dict_table_get_nth_v_col(const dict_table_t *table,
                                                     ulint pos) {
  ut_ad(table);
  ut_ad(pos < table->n_v_def);
  ut_ad(table->magic_n == DICT_TABLE_MAGIC_N);

  return (static_cast<dict_v_col_t *>(table->v_cols) + pos);
}
#endif /* UNIV_DEBUG */

col 可以计算得出:

(gdb) p rec_index->table->v_cols + 0
$17 = (dict_v_col_t *) 0x7f72c8148140

算出 dfield

有了 col 我们就可以算出 dfield

      const auto dfield = dtuple_get_nth_v_field(vrow, col->v_pos);

dtuple_get_nth_v_field 的定义:

/** Gets nth virtual field of a tuple.
@param[in]    tuple    tuple
@oaran[in]    n    the nth field to get
@return nth field */
static inline dfield_t *dtuple_get_nth_v_field(const dtuple_t *tuple, ulint n) {
  ut_ad(tuple);
  ut_ad(n < tuple->n_v_fields);

  return (static_cast<dfield_t *>(tuple->v_fields + n));
}
#endif /* UNIV_DEBUG */

dfield0x7f72c8191bd0

(gdb) p vrow->v_fields + (rec_index->table->v_cols + 0)->v_pos
$18 = (dfield_t *) 0x7f72c8191bd0
(gdb) p *(vrow->v_fields + (rec_index->table->v_cols + 0)->v_pos)
$19 = {
  data = 0x7f72c8191be8,
  ext = 0,
  spatial_status = 0,
  len = 4294967292,
  type = {
    prtype = 20279311,
    mtype = 12,
    len = 1020,
    mbminmaxlen = 21
  }
}

who set the len to 4294967292

最终要的一部分就要分析为什么 len to 4294967292`

通过搜索 dtuple_get_nth_v_field, 我们看到相关文件如下

shell> grep -nr  'dfield = dtuple_get_nth_v_field' .
./row/row0mysql.cc:578:      dfield = dtuple_get_nth_v_field(row, n_v_col);
./row/row0row.cc:117:      dfield = dtuple_get_nth_v_field(entry, i - entry_len);
./row/row0sel.cc:2988:      const auto dfield = dtuple_get_nth_v_field(vrow, col->v_pos);
./row/row0sel.cc:4242:      dfield_t *dfield = dtuple_get_nth_v_field(*vrow, vcol->v_pos);
./row/row0upd.cc:917:      dfield = dtuple_get_nth_v_field(entry, i);
./row/row0upd.cc:1224:  dfield_t *dfield = dtuple_get_nth_v_field(row, vcol->v_pos);
./row/row0upd.cc:1270:    dfield = dtuple_get_nth_v_field(row, col_no);
./row/row0upd.cc:1324:          dfield = dtuple_get_nth_v_field(row, vcol->v_pos);
./row/row0upd.cc:1508:      dfield = dtuple_get_nth_v_field(row, vcol->v_pos);
./row/row0upd.cc:1856:      dfield_t *dfield = dtuple_get_nth_v_field(node->row, col_no);
./include/data0data.ic:457:    dfield_t *dfield = dtuple_get_nth_v_field(vrow, i);
./include/data0data.ic:470:    dfield_t *dfield = dtuple_get_nth_v_field(vrow, i);
./dict/dict0dict.cc:2892:    dfield_t *dfield = dtuple_get_nth_v_field(tuple, i);
./trx/trx0rec.cc:2001:        dfield = dtuple_get_nth_v_field(*row, vcol->v_pos);
./trx/trx0rec.cc:2729:    dfield = dtuple_get_nth_v_field(row, col_no);

通过分析所有的调用 dtuple_get_nth_v_field,我们最终可以看到一个函数 trx_undo_rec_get_partial_row 在文件 trx/trx0rec.cc 的第 2001 行。

if (is_virtual) {
  ptr = trx_undo_read_v_idx(index->table, ptr, first_v_col, &is_undo_log,
                            &field_no);
  first_v_col = false;
  if (field_no != ULINT_UNDEFINED) {
    vcol = dict_table_get_nth_v_col(index->table, field_no);
    col = &vcol->m_col;
    col_no = dict_col_get_no(col);
    dfield = dtuple_get_nth_v_field(*row, vcol->v_pos);
    vcol->m_col.copy_type(dfield_get_type(dfield));
  }
}

if ((vcol != nullptr && vcol->m_col.is_multi_value()) ||
    trx_undo_rec_is_multi_value(ptr)) {
  ut_ad(is_virtual);
  ut_ad(vcol != nullptr || field_no == ULINT_UNDEFINED);
  ut_ad(dfield != nullptr || field_no == ULINT_UNDEFINED);
  ptr = trx_undo_rec_get_multi_value(ptr, dfield, heap);
  continue;
} else {
  ptr = trx_undo_rec_get_col_val(ptr, &field, &len, &orig_len);
}

trx_undo_rec_get_multi_value 调用 Multi_value_logger::read 去修改了 dfield

/** Read from an undo log record of a multi-value virtual column.
@param[in]    ptr    pointer to remaining part of the undo record
@param[in,out]    field    stored field, nullptr if the col is no longer
                        indexed or existing, in the latter case,
                        this function will only skip the log
@param[in,out]    heap    memory heap
@return remaining part of undo log record after reading these values */
const byte *trx_undo_rec_get_multi_value(const byte *ptr, dfield_t *field,
                                         mem_heap_t *heap) {
  if (field == nullptr) {
    return (ptr + Multi_value_logger::read_log_len(ptr));
  }

  return (Multi_value_logger::read(ptr, field, heap));
}

Multi_value_logger::read 方法最终修改了 defield->lenUNIV_MULTI_VALUE_ARRAY_MARKER

const byte *Multi_value_logger::read(const byte *ptr, dfield_t *field,
                                     mem_heap_t *heap) {
  ut_ad(is_multi_value_log(ptr));

  ++ptr;

  uint32_t total_len = mach_read_from_2(ptr);
  const byte *old_ptr = ptr;

  ptr += s_log_length_for_null_or_empty;

  if (total_len == s_multi_value_null) {
    dfield_set_null(field);
    return (ptr);
  } else if (total_len == s_multi_value_no_index_value) {
    dfield_set_data(field, nullptr, UNIV_NO_INDEX_VALUE);
    return (ptr);
  }

  uint32_t num = mach_read_next_compressed(&ptr);
  field->data = mem_heap_alloc(heap, sizeof(multi_value_data));
  field->len = UNIV_MULTI_VALUE_ARRAY_MARKER;

  multi_value_data *multi_val = static_cast<multi_value_data *>(field->data);

  multi_val->num_v = num;
  multi_val->alloc(num, false, heap);

  for (uint32_t i = 0; i < num; ++i) {
    uint32_t len = mach_read_next_compressed(&ptr);
    multi_val->datap[i] = ptr;
    multi_val->data_len[i] = len;

    ptr += len;
  }

  if (ptr < old_ptr + total_len) {
    multi_val->alloc_bitset(heap);
    multi_val->bitset->copy(ptr, UT_BITS_IN_BYTES(num));
    ptr += UT_BITS_IN_BYTES(num);
  }

  ut_ad(ptr == old_ptr + total_len);
  return (ptr);
}

我们可以看到 UNIV_MULTI_VALUE_ARRAY_MARKER4294967292

/** The follwoing number as the length marker of a logical field, which
is only used for multi-value field data, means the data itself of the
field is actually an array. */
const uint32_t UNIV_MULTI_VALUE_ARRAY_MARKER = UNIV_NO_INDEX_VALUE - 1;
(gdb) p UNIV_MULTI_VALUE_ARRAY_MARKER
$23 = 4294967292

故障复现

到目前为止 我们已经清楚了这个 len 为什么是 4294967292。下面就要去复现这个 bug 导致的故障。

函数 trx_undo_rec_get_multi_value 会从 Undo Log record 中构建 Multi-Value virtual column

我们需要想办法让 MySQL 从 Undo Log 中读取数据。

  1. 创建表结构并加入数据。
CREATE TABLE `t` (
  `id` int,
  `a` json DEFAULT NULL,
  KEY `a` ((cast(`a` as char(255) array)))
) ENGINE=InnoDB
PARTITION BY HASH(id)
PARTITIONS 1;
insert into t (id, a) values 
( 1, 
  JSON_ARRAY("session","1")
);
  1. 在 session1 中保持一个 REPEATABLE-READ 的事务。

    --- session 1 ---
    set transaction_isolation = 'REPEATABLE-READ';
    begin;
    select count(*) from t;
  2. 在 session 2 中更新数据。
update t set a = JSON_ARRAY('session', '2') where id = 1;
  1. 回到 session 1 中查询数据。那么 InnoDB 必定利用 Undo 保证事务的一致性。
SELECT a FROM t WHERE ('session' MEMBER OF (a) ); 


ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql_sandbox8028.sock' (111)
ERROR:
Can't connect to the server

成功复现!

反馈 Bug

这个问题已经在 2023 年 7 月份报告给了 Oracle 官方。Oracle 官方回复他们已经修复了这个问题。

Fixed in the upcoming 8.0.35, 8.2.0 under the heading of Bug#35039937 Multi-value virtual columns should not be materialized

在 2023 年 10 月份,MySQL 8.0.35 修复了这个 bug


爱可生开源社区
423 声望206 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。