[CDS Technology Revealed Series 01] Alibaba Cloud CDS-OSS Disaster Recovery Revealed

Introduction to This article mainly describes the deployment form of OSS services in CDS products in terms of disaster tolerance and the basic principles of implementation. The disaster recovery function can ensure that a user's data has redundant backups in multiple places. When an extreme abnormality (such as physical damage) occurs in a certain computer room, the data will not be lost; it can also guarantee that when a certain computer room is unavailable (For example, when the power is off), the functions provided by the user are basically not affected.

Preface

Object Storage Service (OSS) is a massive, secure, low-cost, and highly reliable cloud storage service launched by Alibaba Cloud. It is suitable for storing any type of file; its capacity and processing capabilities support arbitrary elastic expansion and provide multiple Two storage types are available for users to choose according to their own business characteristics, which can help customers fully optimize storage costs; provide data durability up to 99.9999999999% (12 nines), and availability up to 99.995%.

In order to better export the online capabilities of the public cloud to offline, so that offline customers can also enjoy the above-mentioned huge technological dividends; at the same time, to help offline customers effectively reduce the cost of hardware deployment, Alibaba Cloud launched Cloud Define Storage (CDS). This article mainly describes the deployment form of OSS services in CDS products in terms of disaster tolerance and the basic principles of implementation. The disaster recovery function can ensure that a user's data has redundant backups in multiple places. When an extreme abnormality (such as physical damage) occurs in a certain computer room, the data will not be lost; it can also guarantee that when a certain computer room is unavailable (For example, when the power is off), the functions provided by the user are basically not affected.

Principles of disaster tolerance

OSS includes a very important background service, namely Data Replication Service (DRS). When the user opens the data replication service for Bucket (the rules of each data replication service are called replication edges), whenever the user uploads a file, the DRS service will receive a notification, and then the DRS service will automatically and asynchronously transfer the file "Transport" to the destination in the data replication rules, the whole process is completely transparent to the user, and the user does not need to interfere.

The figure above is an example of cross-region replication. After the user has activated the data replication service for the source bucket and configured the destination bucket, DRS will automatically copy the data from the source bucket to the target bucket.

Deployment architecture

Briefly introduce the physical concept of CDS offline deployment. From large to small, they are Cloud (cloud), Region (region), AZ (availability zone), Cluster, and Bucket. It can be illustrated as follows. The outer layer may include one or more inner layers.

The following is a more detailed description of the multi-region deployment architecture under a cloud. Two cluster buckets in the same region can be disaster recovery in the same city, and two cluster buckets in two different Rregions can be disaster recovery in different places. .

Disaster tolerance form

According to the needs of different disaster recovery scenarios of users, OSS provides different disaster recovery forms, which are mainly divided into four scenarios: intra-city disaster recovery, remote disaster recovery (cross-regional replication), cross-cloud replication, and two-site three-center scenarios. Now introduce their characteristics one by one.

1. Disaster tolerance in the same city

The system architecture of intra-city disaster recovery is as follows:

Cluster A and cluster A'are respectively deployed in two AZs in the same Region. During cluster planning, these two clusters are planned to be disaster-tolerant clusters for each other. When a bucket is created in any cluster, the background will open two-way data replication edges (that is, two replication edges) for this bucket between the two clusters. Data written to any cluster through the bucket can be DRS Automatically and asynchronously replicate to another cluster. When the current cluster where the bucket is located fails, you can switch the cluster where the bucket is currently located to another cluster through the one-click switch of the operation and maintenance platform. Since the bucket names are the same, the endpoints for users to access OSS services are also the same, so users do not need to modify the domain names for accessing OSS; the entire switching process is transparent to users and basically does not affect users' services.

city disaster recovery is a very convenient disaster recovery form for users.

2. Remote disaster tolerance

The system architecture diagram of remote disaster recovery is as follows:

Remote disaster recovery is also called cross-regional replication. Since the cluster deployment is not visible to users, the internal cluster deployment is not drawn, only the Bucket, Region, and Endpoint related to the user's use are drawn.

As shown in the above figure, the names of the buckets under the same cloud cannot be the same. BucketA and BucketB are created on different regions, namely RegionA and RegionB. The domain names of these two regions are different, and they are recorded as RegionA-endpoint and respectively. RegionB-endpoint. The domain names for users to access the two buckets are BucketA.RegionA-endpoint and BucketB.RegionB-endpoint. Two replication edges are also opened between the two buckets. Data written in any bucket of a region will be automatically and asynchronously replicated to the bucket of another region by the DRS service. When the overall service is unavailable in a certain Region, the user needs to switch the domain name of the service to access the OSS by himself, and switch from the endpoint of one bucket to the endpoint of another bucket to ensure that the user's own business is not affected.

with the same-city disaster recovery mode, although the remote disaster recovery needs to switch the domain name of the OSS Bucket when an abnormality occurs, because the data can be backed up in two different regions, the data has higher security.

3. Cross-cloud replication

The system architecture of cross-cloud replication is as follows:

Compared with remote disaster recovery, the only difference between cross-cloud replication is that two buckets are deployed on two clouds to provide data replication services between different clouds to meet more disaster recovery deployment forms and needs of users. Because there are two clouds, the bucket names can be the same, but the domain names of the regions under the two clouds are still different.

users use this form of disaster recovery. When a cloud encounters an abnormality, the user also needs to switch the domain name of the OSS Bucket to switch from one cloud to another.

4. Two places and three centers

There are two forms of three centers in two places. One is the two regions and three centers under the same cloud, that is, the two regions are under the same cloud; the other is the two regions and three centers across the cloud, that is, one of the regions is on one cloud. Another Region is on another cloud. In actual deployment, there are more deployments of two locations and three centers across clouds, so the cross-cloud two locations and three centers are used as an example.

Strictly speaking, the cross-cloud two-location three centers belong to a kind of cross-cloud replication, except that one Bucket (bucketA) is the same-city disaster recovery type, and the other Bucket (bucketB) is deployed on another cloud, which is the same-city disaster recovery. Combination with cross-cloud replication; both A and A'are in-city disaster recovery, and A/A' and B are cross-clouds. Write data (A, A', B) in any cluster, and the data will eventually exist in all three clusters, which is the highest level of disaster recovery so far.

As shown in the figure above, the source-side BucketA corresponds to two clusters. When one cluster is abnormal, you can switch to the other cluster by one-click switch. The user does not need to change the bucket domain name for accessing OSS; when one of the clouds appears as a whole In abnormal situations, the user can also switch to another cloud by modifying the method of accessing the OSS Bucket domain name. This type of deployment allows data to be distributed in two places, three clusters (referred to as two places and three centers), with better data security.

Combination disaster recovery

Combined disaster tolerance is a configuration form for users to create replication edges, mainly to provide users with more usage scenarios. It is mainly divided into one-to-many, many-to-one, and source-dest integration. In actual use, it can be one or a combination of these three forms.

1. One-to-many

As shown in the figure below, when you write data to BucketA, the data will be automatically and asynchronously copied to BucketB and BucketC, that is, one source bucket corresponds to multiple target buckets.

2. Many to one

As shown in the figure below, when data is written to BucketB or BucketC, the data will be automatically and asynchronously copied to BucketA, that is, multiple source buckets correspond to the same target bucket.

3. One source and purpose

As shown in the figure below, the data written by the user in BucketA will be asynchronously copied to BucketB, and the data written by the user in BucketB will be asynchronously copied to BucketC; it should be noted that the data written by the user in BucketA will not be transferred and copied to BucketC. BucketB serves as the target end of data replication and the source end of data replication, so we call it the source and destination for short.

Future outlook

Disaster tolerance is the most basic application requirement for users in the process of using data. Only when disaster tolerance is done can the data be backed up under different abnormal conditions without data loss. After more than ten years of technical precipitation and polishing, OSS has provided a wealth of disaster tolerance functions to meet the different needs of different users and different scenarios. It has been widely used in banks, government and enterprises and other customers to strictly protect customer data Safety. In terms of data disaster recovery, Alibaba Cloud's CDS-OSS has core competitiveness.

At the same time, the disaster tolerance function of CDS-OSS is still being continuously polished, and more new functions and features will be provided to users in the future, which will continue to bring value to users.

Original work: Alibaba Cloud Storage Zen Home

Series articles pass the door:

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.