Abstract: This article comes from the Huawei Cloud MySQL R&D team. It mainly shares the backup process of the MySQL backup tool Xtrabackup, the optimization and improvement made by the Huawei Cloud database team, as well as the problems and solutions that may be encountered in use.

This article is shared from the Huawei Cloud Community " HUAWEI CLOUD takes you to explore Xtrabackup backup principles and common problem analysis ", author: GaussDB database.

This article comes from the HUAWEI CLOUD MySQL R&D team. It mainly shares the backup process of the MySQL backup tool Xtrabackup, the optimization and improvement made by the HUAWEI CLOUD database team, as well as the problems and solutions that may be encountered in use. The content discussed in this article is mainly for Huawei Cloud RDS for MySQL and the user-built community version MySQL database. I hope it will help everyone understand and use Xtrabackup, and will be more calm when facing Xtrabackup problems in the future.

1. Introduction to Xtrabackup

Xtrabackup is an open source backup tool for MySQL database physical hot backup developed by the Percona team. has the characteristics of fast backup speed, support for backup data compression, automatic verification of backup data, support for streaming output, and almost no business impact during the backup process. , Is the MySQL backup tool commonly used by various cloud vendors.

There are currently two versions of Xtrabackup: Xtrabackup 2.4.x and 8.0.x, which are used to back up MySQL 5.x and MySQL 8.0.x respectively. Below we introduce how Xtrabackup backs up MySQL Community Edition and the backup principle of Xtrabackup on HUAWEI CLOUD.

2. Xtrabackup backup of the community version of MySQL

Xtrabackup is designed for Percona MySQL. It also supports backup of the official community version of MySQL. The process is shown in the following figure:
image.png

Figure 1: Xtrabackup backup official MySQL process diagram

  1. compatibility check: Xtrabackup community version only supports the tables of MyISAM, InnoDB, CSV, and MRG_MYISAM. The tables of other storage engines will not be backed up; in this step, by querying tables, if there is a storage engine for the table found If it is not one of the above four engines, it will print a warning, indicating that Xtrabackup will not back up the table.
  2. starts the redo background backup thread: starts the redo background backup thread, starting from the position of the last checkpoint LSN of the backup instance, backing up all incremental redo logs, and continuing until the end of the backup task.
  3. loads all innodb table spaces: opens and scans the data files of all innodb tables, checks the first page of all table spaces, and initializes the memory structure of all tables.
  4. backup innodb table: traverses the memory structure of the table constructed in step 3, backs up the data file of each innodb table, and checks whether the data of each page is correct during the backup process.
  5. plus backup lock FLUSH TABLES WITH READ LOCK (FTWRL): FTWRL lock is a MySQL instance-level read lock. The locking process is complicated, and after locking, all update operations and DDL of all tables will be blocked.
  6. Back up non-innodb tables: Because we have added a read lock to the instance in step 5, it is safe to back up non-innodb tables at this time, and there must be no write services at this time.
  7. records the current GTID information of binlog: Please note that at this time we still hold the global read lock. This step is mainly for us to use the backup set to quickly create a backup machine.
  8. Stop the redo backup thread.
  9. lock resource is released and the backup ends.

It should be noted that there is a difference between Xtrabackup 2.4.x and 8.0.x in the 7th and 8th steps. This difference is due to MySQL 8.0.x. We will introduce the details below.

3. HUAWEI CLOUD RDS for MySQL backup

When backing up a community version of MySQL instance, Xtrabackup will add a global read lock (FTWRL) to the instance. This lock has a great impact on the business of the database. In severe cases, it may even cause the database to "hang", which is unacceptable for customers. . Therefore, the HUAWEI CLOUD MySQL team has optimized this process. There are two main points:

  1. Added backup lock for MySQL 5.x and 0.x: LOCK TABLES FOR BACKUP
  2. Added binlog lock to MySQL 5.x: LOCK BINLOG FOR BACKUP

After optimization, the backup process of HUAWEI CLOUD Xtrabackup to MySQL is as follows:
image.png

Figure 2 Diagram of the process of Xtrabackup backup HUAWEI CLOUD MySQL

Compared with the FTWRL lock, the backup lock LOCK TABLES FOR BACKUP has little impact on the client instance, and its locking process is simple. The DML operation of the innodb table is not affected during the lock , but all the update operations and DDL of the non-innodb table The operation is still not allowed.

After backing up all the table files, Xtrabackup needs to obtain binlog GTID information.

• For MySQL 5.x version, Xtrabackup 2.4.x will execute the LOCK BINLOG FOR BACKUP operation, lock the binlog, and then obtain the GTID information.
• For MySQL 8.0.x version, HUAWEI CLOUD Xtrabackup 8.0.x follows the official method of querying consistent backup points. When Xtrabackup queries log_status, the MySQL server will add lightweight locks to redo log, binlog, etc. to obtain consistent backup points. This process is very short and has almost no impact on the operation of the instance. The backup consistency point of MySQL 8.0.x will tell us the consistent redo log LSN and the GTID of the binlog; after querying the backup consistency point, Xtrabackup will back up the last binlog file for whether the arbitration transaction needs to be rolled back during recovery; Finally, the redo log backup thread task will stop at the point where the LSN of the redo log read by it is greater than the redo log LSN of the queried backup consistency point.

Since Xtrabackup 2.4.x and 8.0.x have differences in processing binlog and the recovery process also has differences, we will elaborate on them in subsequent articles.

Four, common problems and solutions

HUAWEI CLOUD has used Xtrabackup to provide backup services for almost all MySQL instances of the company. During the use process, we actively keep in touch with the community and report some problems in the use process to the Percona community to help Xtrabackup evolve in a better direction. In addition, if the community fails to fix some fatal problems discovered in a timely manner, the Huawei Cloud database team will make timely repairs to ensure the correctness of the backup data.

The following is a summary of the problems that may be encountered in each stage of the backup process using Xtrabackup, analysis of the causes and corresponding solutions,

1. Compatibility check stage

  • Symptom: After Xtrabackup is started, it "hangs" for a long time immediately, and the redo log backup thread is also not started after checking the log.

Reason: MDL lock could not be obtained during Xtrabackup compatibility check. Xtrabackup compatibility check is achieved by querying the plugin table imformation_schema.tables:

“SELECT CONCAT(table_schema, '/', table_name), engine FROM information_schema.tables WHERE engine NOT IN ('MyISAM', 'InnoDB', 'CSV', 'MRG_MYISAM') AND table_schema NOT IN ('performance_schema', 'information_schema', 'mysql')”

When querying each table, you need to obtain the MDL lock of the corresponding table. If there is a long-term DML or DDL statement in the MySQL instance at this time, or an MDL deadlock occurs, the above query will always be blocked waiting for the MDL lock At this time, Xtrabackup will "hang" for a long time.

Solution: If the reason for waiting for the lock is only due to the blockage of other SQL statements, just wait for other SQL execution to complete; if a deadlock occurs, you need to analyze the cause of the deadlock at this time and remove the deadlock; Huawei Cloud RDS for MySQL provides the MDL lock view function, which can help users analyze the MDL deadlock of the business.

2.redo log backup phase

  • problem phenomenon 1: redo log rewinds, backup fails, Xtrabackup reports the following error message:
    “xtrabackup: error:it looks like InnoDB log has wrapped around before xtrabackup could process all records due to either log copying being too slow, or log files being too small.\n");”

Reason: During the backup process, if the host's business load is high, the redo log writing speed will be very high, and the backup speed of the redo log backup thread of Xtrabackup will be lower than the writing speed of the redo log, because MySQL redo log File writing uses the round-robin method, so that the newly written log overwrites the previously written but not yet backed up log, so the backup fails.

solution: recommends backup during the low peak period of the business, or increase the file size of the redo log.

  • problem phenomenon 2: backup failed due to DDL operation, the error message is as follows:

“An optimized (without redo logging) DDLoperation has been performed. All modified pages may not have been flushed to the disk yet.

PXB will not be able take a consistent backup. Retry the backup operation”

Reason: backup process, the MySQL instance created an index DDL operation, because the index creation will not write redo. If the backup continues, it will cause data inconsistency. Therefore, Xtrabackup backup failure in this scenario is expected behavior.

Solution: Do not create indexes during the backup process. If you really need them, it is recommended to directly bring the indexes in the table creation statement, or use the lock-ddl parameter for backup (blocking new DDL operations on the instance).

  • problem symptom 3: undo truncate causes the backup to fail, and the Xtrabackup error message is as follows:

“An undo ddl truncation (could be automatic) operation has been performed.”

Reason: During the Xtrabackup backup, if undo truncate occurs in the MySQL instance, the undo log written to the new undo file (with different space id) may be lost, causing problems with the recovered data. The official fixes this problem in Xtrabackup 8.0.14 version (based on MySQL 8.0.21). The fix is to redo backup thread. When parsing redo log, if the operation is found to be a truncate operation of undo log, the backup will fail. Unfortunately, the repair did not completely solve the problem. In the following two scenarios, the community version of Xtrabackup may still have inconsistencies in the recovered data:

  1. MySQL version is lower than MySQL 8.0.21;
  2. During the backup process, the user created a new undo tablespace by himself.

solution: disables the truncate operation of the undo tablespace during the backup, and prohibits users from creating undo tablespaces, which can effectively prevent the inconsistency of backup data recovery; in addition, HUAWEI CLOUD Xtrabackup has further repaired this problem, which can effectively Prevent this kind of phenomenon from happening.

3. Load the table space stage

  • problem phenomenon 1: Xtrabackup error: Too many open files

Reason: The operating system allows a limited number of files that can be opened at the same time. Xtrabackup will open all table files at the same time during the load tablespace stage. If the number of tables opened by Xtrabackup exceeds this limit, the backup will fail.

solution: increase the operating system, allow the configuration of the maximum number of files to be opened at the same time, or use the lock-ddl parameter (blocking new DDL operations on the instance).

  • Problem Phenomenon 2: rename table causes the backup to fail, the error message is as follows:

“Trying to add tablespace 'xxxx' with id xxx to the tablespace memory cache, but tablespace xxxx already exists in the cache!;”

Reason: is not locked in the whole process of opening the tablespace in Xtrabackup. If rename table occurs, there is a probability that the same tablespace will be loaded repeatedly. At this time, Xtrabackup will detect the duplicate tablespace id, so the backup fails.

solution: Generally speaking, loading table space is a very fast operation, rename table is not a very frequent operation, you can try again in this case (Percona Xtrabackup 2.4.x only supports single-threaded table space loading, HUAWEI CLOUD Xtrabackup supports multi-threaded loading table space).

4. Backup innodb table stage

  • problem phenomenon: innodb table data file is damaged, the backup fails, the error message is as follows:

“xtrabackup: Database page corruption detected at page xxxx, retrying.”

Reason: Xtrabackup checks the checksum of each page when backing up the innodb table data file. If the checksum is found to be incorrect, the backup fails. At this time, the data of the MySQL instance has been damaged (for example, disk quiesced error).

solution: needs to restore the previous backup data or other methods to restore the data before the backup can be successful. In subsequent articles, we will also introduce the data restoration methods in detail.

V. Conclusion

This article mainly introduces the principle of Xtrabackup backup, the backup community version of MySQL and Huawei Cloud's improvements, and shares the troubleshooting and solutions to common problems of Xtrabackup. We will also bring you more in-depth analysis and more practical skills in the follow-up. , I hope it will be helpful for everyone to understand and use Xtrabackup. We will also continue to provide customers with better database services, and always guard customer data security.

Finally, I would like to tell you a good news. The cloud database MySQL package at 19.9 yuan per year, helping enterprises to go to the cloud without worry. 16169350c331a4 Welcome everyone to come and experience

Click to follow and learn about Huawei Cloud's fresh technology for the first time~


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量