Author: Li Pengbo

A member of the Aikesheng DBA team, mainly responsible for MySQL troubleshooting and SQL audit optimization. Perseverance in technology, responsible for customers.

Source of this article: original submission

* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.


<br/>


<center>Author's self-portrait</center>

MySQL 8 adds a new clone plug-in, which is used for distributed recovery of MGR and can also be used for physical backup and recovery.

But in the process of the clone operation, when the data is pulled and the server is automatically restarted, the restart failure will always occur, such as:

ERROR 3707 (HY000): Restart server failed (mysqld is not managed by supervisor process)。 error indicating that RESTART failed, and it needs to be manually restarted later, the error code is 3707, that is: 0611ccc1ee6e5e

And in the official document related link about clone: https://dev.mysql.com/doc/refman/8.0/en/clone-plugin-remote.html, this error is also specifically stated:

This means that the recipient server will restart after the clone data is pulled, provided that the monitoring process is available. And don't worry when there is a related error, it does not mean that the clone has failed, and then you only need to restart it manually.

Through the above logs and official documents, we got two clues about restart failure: RESTART and monitoring process.

look at the relevant official documentation on RESTART (1611ccc1ee6ee8 https://dev.mysql.com/doc/refman/8.0/en/restart.html):

From this document, we can know that if you want to successfully execute the "RESTART" command, you need a monitoring process, so the key to the success of the "RESTART" execution lies in this monitoring process, and what is the document behind this monitoring process? Also explained:

At this time we know to use systemd or mysqld_safe to implement this monitoring process in Unix-like systems.

But sometimes when we use the self-built systemd MySQL service service, we still cannot realize automatic restart, and the key to the problem is that the relevant monitoring process is not configured. We can refer to the systemd service generated when the official rpm package is installed when MySQL Server is installed. The "[Service]" area of the file:

Restart=on-failure

RestartPreventExitStatus=1

# Set enviroment variable MYSQLD_PARENT_PID. This is required for restart.
Environment=MYSQLD_PARENT_PID=1

The official systemd service file has pointed out that if you want to realize automatic restart, the most important thing is to set "Environment=MYSQLD_PARENT_PID=1", the process with PID 1 is the process of systemd.

The official restart timing is "on-failure", that is, the database will restart when it encounters an abnormal downtime, process interruption signal or monitoring timeout, but when the database is down abnormally, sometimes we don't want the database to be automatically automatically To restart, you need to manually restart after the operation and maintenance personnel have confirmed the problem. At this time, we need to adjust the automatic restart strategy.

In the official documents related to "RESTART", it is clearly pointed out that the exit status code when the database is closed when the "RESTART" command is executed: 16. At this time, we can set the automatic restart only when the database exit status code is 16, and will not restart automatically in other cases. The "[Service]" area of the MySQL systemd service is configured as follows:

RestartForceExitStatus=16
RestartPreventExitStatus=1
# Set enviroment variable MYSQLD_PARENT_PID. This is required for restart.
Environment=MYSQLD_PARENT_PID=1

"RestartForceExitStatus=16" means that no matter whether "Restart=" is configured or not, when the service exit status code is 16, it will automatically restart. This solves the problem of clone automatic restart failure and also ensures that the database is in other There will be no automatic restart under abnormal conditions.

For example, it will not automatically restart when an interrupt signal is sent to MySQL:

Can automatically restart when performing a clone operation

There is no previous error, automatic restart


爱可生开源社区
426 声望207 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。


引用和评论

0 条评论