Author: Li Pengbo
A member of the Aikesheng DBA team, mainly responsible for MySQL troubleshooting and SQL audit optimization. Perseverance in technology, responsible for customers.
Source of this article: original submission
* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.
<br/>
<center>Author's self-portrait</center>
MySQL 8 adds a new clone plug-in, which is used for distributed recovery of MGR and can also be used for physical backup and recovery.
But in the process of the clone operation, when the data is pulled and the server is automatically restarted, the restart failure will always occur, such as:
ERROR 3707 (HY000): Restart server failed (mysqld is not managed by supervisor process)。
error indicating that RESTART failed, and it needs to be manually restarted later, the error code is 3707, that is: 0611ccc1ee6e5e
And in the official document related link about clone: https://dev.mysql.com/doc/refman/8.0/en/clone-plugin-remote.html, this error is also specifically stated:
This means that the recipient server will restart after the clone data is pulled, provided that the monitoring process is available. And don't worry when there is a related error, it does not mean that the clone has failed, and then you only need to restart it manually.
Through the above logs and official documents, we got two clues about restart failure: RESTART and monitoring process.
From this document, we can know that if you want to successfully execute the "RESTART" command, you need a monitoring process, so the key to the success of the "RESTART" execution lies in this monitoring process, and what is the document behind this monitoring process? Also explained:
At this time we know to use systemd or mysqld_safe to implement this monitoring process in Unix-like systems.
But sometimes when we use the self-built systemd MySQL service service, we still cannot realize automatic restart, and the key to the problem is that the relevant monitoring process is not configured. We can refer to the systemd service generated when the official rpm package is installed when MySQL Server is installed. The "[Service]" area of the file:
Restart=on-failure
RestartPreventExitStatus=1
# Set enviroment variable MYSQLD_PARENT_PID. This is required for restart.
Environment=MYSQLD_PARENT_PID=1
The official systemd service file has pointed out that if you want to realize automatic restart, the most important thing is to set "Environment=MYSQLD_PARENT_PID=1", the process with PID 1 is the process of systemd.
The official restart timing is "on-failure", that is, the database will restart when it encounters an abnormal downtime, process interruption signal or monitoring timeout, but when the database is down abnormally, sometimes we don't want the database to be automatically automatically To restart, you need to manually restart after the operation and maintenance personnel have confirmed the problem. At this time, we need to adjust the automatic restart strategy.
In the official documents related to "RESTART", it is clearly pointed out that the exit status code when the database is closed when the "RESTART" command is executed: 16. At this time, we can set the automatic restart only when the database exit status code is 16, and will not restart automatically in other cases. The "[Service]" area of the MySQL systemd service is configured as follows:
RestartForceExitStatus=16
RestartPreventExitStatus=1
# Set enviroment variable MYSQLD_PARENT_PID. This is required for restart.
Environment=MYSQLD_PARENT_PID=1
"RestartForceExitStatus=16" means that no matter whether "Restart=" is configured or not, when the service exit status code is 16, it will automatically restart. This solves the problem of clone automatic restart failure and also ensures that the database is in other There will be no automatic restart under abnormal conditions.
For example, it will not automatically restart when an interrupt signal is sent to MySQL:
Can automatically restart when performing a clone operation
There is no previous error, automatic restart
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。