Abstract: large-scale distributed systems cannot be avoided. When a single point of failure occurs in DN, what are the recovery methods and how to recover? This section focuses on how to repair the single point of failure of DN by operating gs_ctl build.
This article is shared from the HUAWEI CLOUD community " HUAWEI CLOUD data warehouse standby machine DN reconstruction, fast repair DN single point of failure! ", the original author: welblupen.
1. Technical background
The DN high-availability architecture of GaussDB (DWS) is a master, standby, and slave architecture. That is, in a distributed environment, complete cluster data is distributed across multiple DN groups using fragmentation technology, and each group of DNs is responsible for one data fragmentation, including: a primary DN, a backup DN, and a slave backup DN. The master and the backup each have a complete set of data. Generally, the slave does not store data. It only temporarily stores the data when the backup machine fails. After the backup machine fails, in order to maintain the consistency of the cluster data, the backup machine needs to be connected to the host. Make a copy of data and xlog logs.
2. Scenarios where the backup DN needs to be rebuilt
2.1. After the host has a single point of failure, the standby machine will failover to become the primary, the original primary will be downgraded, and the cluster will be downgraded; after the original primary failure is restored, the WAL log CRC check of the primary and standby machines may fail. The CM system will detect this status. Automatically rebuild the standby machine through the way of rebuilding the standby machine.
2.2. After a single point of failure of the standby machine occurs, the state of the standby machine becomes unknown, and the cluster is degraded. After the standby machine fails to recover, the standby machine needs to be rebuilt to synchronize data with the host.
3. Operation classification of backup DN reconstruction
3.1. Incremental reconstruction: gs_ctl build -b incremental -Z datanode
use:
Incremental build can repair common host or instance failures caused by backup log bifurcation problems, and can also repair some data file loss problems. If a host exception occurs during the rebuilding process, you can manually roll back and recover from the loss.
process:
- Obtain the difference file: Obtain the difference file between the primary and backup DNs by parsing the Xlog log
- Backup and recovery: Strictly perform atomic recovery and backup of the primary and backup differential files. Errors in the process can be recovered. After the errors are eliminated, reentry can be called again
- File transfer: The designated (1-16) threads are created by the backup machine to pull the difference file from the host
- Complete incremental reconstruction and wait for the xlog log to be placed on the disk
analysis:
Incremental reconstruction is to calculate the difference between the primary and backup DN files based on the Xlog log, and send the files to the backup DN, and quickly perform incremental reconstruction without any damage to the backup data at a low cost.
3.2. Full reconstruction: gs_ctl build -b full -Z datanode
use:
The full reconstruction of the standby machine can repair most data and log damage or loss scenarios, but the repair time is longer than the incremental build
process:
- Obtain the difference file: Use the CRC-32C series algorithm based on hardware tuning to obtain the CRC check value of the corresponding file on the main DN, and also perform the corresponding operation locally, compare the two to obtain the difference file list
- Backup and recovery: By default, there is no atomization, but atomic recovery will be attempted, ignoring the success or failure of the recovery result
- File transfer: The designated (1-16) threads are created by the backup machine to pull the difference file from the host
- Complete incremental reconstruction and wait for the xlog log to be placed on the disk
analysis:
Full reconstruction is based on the main DN file, and the backup DN file is verified with it. If a file block of the backup DN file is inconsistent, the host sends this file block to the backup DN. Compared with full cleanup and reconstruction, the amount of data copied and the amount of WAL logs are less, and the cost is moderate.
3.3. Full cleanup and reconstruction: gs_ctl build -b fullcleanup -Z datanode
use:
The difference from the full mode is that the data directory of the DN host needs to be cleaned up before synchronization. Able to repair most data and log damage or loss scenarios, but the repair time is longer than other modes
process:
- Clean up the data files of the standby machine: clear the data directory of the standby machine and keep the configuration files
- The host transmits the full amount of mirroring to the standby computer: the host uses a single thread to send all its data directories to the standby computer except for the configuration file
- Complete the full reconstruction and wait for the xlog log to be placed on the disk
analysis:
Full cleaning and reconstruction is the backup machine empties the data directory, saves the configuration files, and sends a full reconstruction request to the host. The host sends all its own data directories to the backup machine except for the configuration files. After rebuilding, the backup machine is started, which is costly.
4. Summary
The main purpose of the backup machine DN reconstruction function is to repair a single point of failure. The backup machine reconstruction method is divided into full reconstruction, full cleaning reconstruction and incremental reconstruction according to the realization, all of which interact with the main DN. When a single point of failure occurs in the DN, the operator should select an appropriate reconstruction method to reconstruct the data of the standby machine according to the actual damage degree and resource consumption.
For more information about GuassDB (DWS), welcome to search "GaussDB DWS" on WeChat and follow the WeChat official account to share with you the latest and most complete PB-level data warehouse black technology~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。