Abstract: With the development of information technology, human beings have entered the era of big data, and the amount of data has exploded. The core business of data carrying in the financial field requires recovery and rapid recovery even if it encounters various software and hardware errors or disasters. Business capabilities, so backup and recovery capabilities have become one of the most critical capabilities of Data Warehouse.

This article is shared from the HUAWEI CLOUD community " does not move like a mountain, GaussDB (DWS) business fault-tolerant tool-physical fine-grained backup and recovery technology ", the original author: the magician at the end of the century.

1. Technical Overview

1.1 Value and main content

With the development of information technology, mankind has entered the era of big data, and the amount of data has exploded. The core business of data carrying in the financial field requires the ability to retrieve and quickly restore the business even if it encounters various software and hardware errors or disasters. Backup and recovery capabilities have become one of the most critical capabilities of Data Warehouse. GaussDB (DWS) supports physical fine-grained backup and recovery capabilities. Users can customize the backup of the entire cluster or part of the database elements, and perform flexible single and multi-table recovery, effectively reducing the time window and storage space required for backing up data, while focusing on the backup and recovery of key tables in the user's business scenarios.

There are currently two main scenarios supported by physical fine-grained backup and recovery:

1. Restore single/multiple tables from a fine-grained cluster-level full backup set;

2. Back up the full data of the specified schema, and restore single/multiple tables from the backup set;

2. Principles of the technical solution

2.1 NBU backup and recovery solution

Roach is a GaussDB (DWS) database backup tool that supports a variety of backup and recovery types and solutions. For the Roach general architecture, each cluster node has a Roach agent process responsible for the data backup of the node. The first node has an additional Roach master process responsible for distributed cluster backup. Roach provides a non-intrusive backup to NBU solution. The physical fine-grained backup and recovery is also based on this framework, so this is introduced first. The Roach client plug-in is deployed to the NBU Media Server machine to receive the backup data sent by the Roach agent and forward it to the NBU server.

The NBU cluster deployment mode is as follows:
image.png

Figure 1 NBU cluster deployment mode

Non-intrusive NBU backup solution architecture:
image.png

Figure 2 NBU non-intrusive deployment scheme

As shown in the figure above (a 3-node GaussDB (DWS) cluster as an example), the flow of NBU backup data is as follows:

  1. Roach agent transmits compressed data to Roach client in fragments;
  2. Roach agent calls the XBSA interface of the NBU client to request NBU backup;
  3. NBU client forwards this request to NBU Master Server;
  4. The NBU master server is responsible for assigning storage to which NBU media server;
  5. Roach client will call the xbsa interface to transfer the backup data to NBU Media Server;
  6. Media Server stores the backup data to the mounted tape drive or disk

2.2 Fine-grained meta-information generation scheme

In order to recover single or multiple tables in a fine-grained manner from the backup set, you first need to obtain the metadata DDL of the schema and tables in all databases, persist them and back them up to the media. It should be noted that due to the long export time of meta-information DDL, the design adopts a parallel method of DDL export backup and data backup to improve performance: Roach's design ideas for obtaining DDL are as follows:
image.png

Figure 3 Scheme for obtaining metadata from physical fine-grained backup and recovery

During the backup process, in order to support fine-grained recovery, the table name is mapped to meta information, and then the physical files and transaction information of all related tables are found. A mapping map needs to be obtained and backed up. The map is obtained layer by layer according to the database element level, mainly including:

Agent --> Instance –> Database –> Schema ->Table –> Related Relations

In actual execution, the mapping meta-information query and storage will be performed in parallel for each physical node and instance. The design logic is as follows:
image.png

Figure 4 Fine-grained backup and recovery file mapping MAP acquisition

2.3 Data backup production-consumer model

The fine-grained full backup of GaussDB (DWS) is implemented in Roach backup tool. Roach cluster-level full backup, through interaction with GaussDB (DWS) Kernel, after pg_start_backup() is executed, the row data files and WAL files are backed up in turn, and the column data files are backed up after pg_stop_backup() is executed. This series of backup procedures ensure that the data is stored on the disk under the complete transaction guarantee and is backed up to the media path managed by NBU in an orderly manner.

The process of fine-grained data backup first relies on the MAP obtained from the system table query in 2.2 to organize the list of physical files to be backed up, specifically to each library, schema, and table, step by step, and finally get all the associations that a table depends on. The smallest set of files, the smallest logical granularity of the created backup block is the schema, and the tables under the same schema will be continuously written to the same logical block, physically cut according to the segment configuration (usually 4G). The specific backup data is written to the medium, relying on the following production-consumer model, the data file under the instance is read in blocks by the data writing thread (producer), compressed and written into the buffer, and the sending data thread (consumer) is from Obtain the data block in the buffer, call the XBSA standard API interface, and write the data to the media layer in a stream. The NBU Master allocates the Media Server and finally places it on the disk; during the process of additional data writing, if the upper limit of the segment file segment size is exceeded, The backup files will be cut to form backup compressed files such as file_0.rch and file_1.rch. The production-consumer model is shown in the figure below:
image.png

Figure 5 Physical fine-grained data backup to the media producer-consumer model

2.4 Fine-grained recovery of multiple tables

Currently, it supports fine-grained recovery of multiple tables from a cluster-level full backup set or a schema-level backup set. The core technical ideas of these two main scenarios are the same. The scenario support is as follows:

  • Support single table or multiple table recovery from the cluster-level full backup set at a time, the name list of the recovery table is written into a configuration file, and the configuration file name is specified by the recovery parameter—table-list;
  • Cluster-level full backup set restores single/multiple tables, and the specified tables to be restored can span multiple schemas;
  • When restoring, you can specify to restore to the original table or new table. The new table can be in a different schema from the original table, but it needs to be in the same database. The table name can be the new table name; the specified restore target schema can exist or not. Create a new table during restoration, and the restored new table is configured by the file specified by —restore-target-list. If you want to restore all to the original table name, —table-list and —restore-target-list can specify the same configuration file;
  • If the specified recovery target table exists during recovery (the original table name or the new table name), then the recovery can specify the -clean parameter to drop the cascade cascade to delete the table (views, indexes, permissions, etc.) and then recover, without This parameter requires the user to manually confirm the drop before restoring. This is mainly to deal with scenarios where the previous table name is the same during backup and restore, but the table definition is different.
  • Fine-grained is online recovery, and the cluster is not cleaned up without cleaning up the data. After the recovery is completed, the table can be used directly without additional time consumption such as build.

The following figure shows the main plan design during recovery:
image.png

Figure 6 Fine-grained online recovery single/multi-table logic diagram

Figure 6 Online recovery of single/multi-table logic from physical fine-grained backup

The steps for recovery are briefly described as follows:

Step1: After receives the Roach Agent data request of the work process corresponding to each instance of each node, Roach Client establishes a connection with the NBU media, starts List retrieval and obtains files, and sends them to Roach Client;

Step2: The Roach Client and the Roach Agent of each node forward the data to be restored through the TCP connection, including metadata and instance data, and store the instance in the buffer after obtaining it;

Step3: Roach reads the list of table information to be restored, constructs a filter to restore the dropped files, and restores only the target table backup file;

Step4: parses the meta information of the table to be restored according to the restored DDL, and creates the intermediate temporary tmp table and the final recovery target table according to the meta information;

Step5: creates a meta-information map according to the newly created tmp table, and maps it with the map information of the original backup table one by one, and filters the file placement;

Step6: the backup data file relfilenode to the newly created tmp table relfilenode;

Step7: queries the data in the tmp table and inserts the data into the final target table.

3. Actual measurement of fine-grained backup and recovery technology

3.1 Test environment

image.png

3.2 Execution of fine-grained recovery use cases

Here are the use case implementations of some typical scenarios:

  • Schema level backup, restore single/multiple tables

Verification point:

  • The specified schema is successfully backed up;
  • Restore multiple tables from the schema backup set to the target table;
  • Data structure
    image.png
    Perform Schema level backup:
    python $GPHOME/script/GaussRoach.py -t backup --master-port 9500 --media-destination nbu_policy --media-type NBU --metadata-destination $GAUSSHOME/roachbackup/metadata --physical-fine-grained --schemaname public --dbname test_tpch1 --parallel-process 3 --nbu-on-remote --nbu-media-list /home/omm/media.list --client-port 9200
    image.png
  • Specify to restore customer (column storage table), public.nation (row storage table) from the Schema backup set to liding11.customer11, liding22.nation22
  • Perform recovery of the specified multiple tables:

python $GPHOME/script/GaussRoach.py -t restore --master-port 9500 --media-destination nbu_policy --media-type NBU --metadata-destination $GAUSSHOME/roachbackup/metadata --physical-fine-grained --backup-key 20201226_101715 --dbname test_tpch1 --table-list /home/omm/table.list --parallel-process 3 --restore-target-list /home/omm/target.list --clean --nbu-on-remote --nbu-media-list /home/omm/media.list --client-port 9200
image.png

Data validation
image.png

4. Technical Summary

This article mainly analyzes the GaussDB (DWS) physical fine-grained backup and recovery technology from several dimensions of technical value, application scenarios, technical principles, and technical measurements. It can be seen that the physical fine-grained backup and recovery is for the existing full data backup and recovery. An effective enhancement, customers can plan their own hot and cold data in a more flexible way, choose a smaller logical granularity for backup or recovery, save valuable backup storage space and cpu resources, and less impose on online business. Come shock. At the recovery level, unlike the old cluster-level full recovery that requires stopping the cluster and cleaning up data, fine-grained online recovery does not affect any online business, nor does it bring possible data loss risks due to cleaning up the cluster before recovery. Therefore, this technology has Broader prospects and far-reaching significance.

For more information about GuassDB (DWS), welcome to search "GaussDB DWS" on WeChat and follow the WeChat official account to share with you the latest and most complete PB-level data warehouse black technology~

Click to follow, and get to know the fresh technology of


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量