云原生 - Detailed explanation of kubernetes backup and recovery tool Velero | In-depth understanding of the third issue of Carina series - 个人文章

Carina is a cloud native local storage project led and initiated by Boyun (GitHub address: https://github.com/carina-io/carina ), which has now entered the CNCF panorama.
Carina can provide high-performance , O&M- free local storage solutions for stateful applications in cloud-native environments, with capabilities such as storage volume lifecycle management, LVM/RAW disk provisioning, intelligent scheduling, RAID management, and automatic tiering. It provides a data storage system with extremely low latency, free operation and maintenance, and database-awareness for cloud-native stateful services. As one of the components of the Boyun container cloud platform, Carina has been running stably in the production environments of multiple financial institutions for many years.

There are two main traditional data backup solutions. One is to use the server that stores data to achieve snapshot-based backup, and the other is to deploy a dedicated backup agent on each target server and specify a backup data directory to regularly copy data to external storage. The backup mechanisms of these two methods are relatively stable, and cannot adapt to deployment scenarios such as elasticity and pooling after containerization in the cloud-native era.

Taking the cloud native storage plug-in Carina as an example, in data-sensitive scenarios such as databases, each database cluster includes multiple computing instances, and instances may drift arbitrarily within the cluster and achieve automatic failure recovery. The traditional data backup method cannot automatically follow the migration of computing instances in scenarios such as database cluster rapid expansion and shrinkage, cross-node drift, etc., resulting in data backup failure. Therefore, a backup tool that fits the k8s container scenario is very important.

Kubernetes backup and recovery tool: velero

Velero is a disaster recovery and migration tool in the cloud-native era, written in Go language and open sourced on github. The open source address is: https://github.com/vmware-tanzu/velero . Velero comes from Spanish and means sailboat, which fits well with the naming style of the Kubernetes community.

With velero users can securely backup, restore and migrate Kubernetes cluster resources and persistent volumes. Its basic principle is to back up cluster data, such as cluster resources and persistent data volumes, to object storage, and pull data from object storage during recovery. In addition to disaster recovery, it can also perform resource transfer and support the migration of container applications from one cluster to another, which is also a very successful use case for velero.

Velero mainly includes two core components, namely the server and the client. The server runs in a specific Kubernetes cluster, and the client is a local command-line tool, which is very simple as long as kubectl and kubeconfig are configured.

Based on the kubernetes resource backup capability Velero implements, it can easily implement data backup and recovery of Kubernetes clusters, copy kubernetes cluster resources to other kubernetes clusters, or quickly copy the production environment to the test environment.

In terms of resource backup, velero supports data backup to numerous cloud storages, such as AWS S3 or S3-compatible storage systems, Azure Blob, Google Cloud storage, Aliyun OSS, etc. Compared with the data storage engine etcd that backs up the entire kubernetes, velero's control is more refined. It can back up the object level in the Kubernetes cluster, and can also back up or restore objects such as Type, Namespace, and Label by classification.

Velero Workflow

Taking the core data backup as an example, when executing velero backup create my-backup :

The Velero client first calls the Kubernetes API server to create the Backup object;
BackupController will be notified that a new Backup object has been created and perform validation;
BackupController starts the backup process, it collects data for backup by querying the API server for resources;
BackupController will call object storage services, for example, AWS S3 - to upload backup files. By default, velero backup create supports disk snapshots of any persistent volume, you can adjust the snapshots by specifying other flags, run velero backup create --help to see the available flags, or use --snapshot-volumes=false --option disables snapshots.

Regarding backup storage locations and volume snapshots, Velero has two custom resources, BackupStorageLocation and VolumeSnapshotLocation, that configure where Velero backups and their associated persistent volume snapshots are stored.

The backend storage primarily supported by BackupStorageLocation is S3-compatible storage, the prefix in the storage area where all Velero data is stored, and a set of other provider-specific fields. For example: Minio and Alibaba Cloud OSS, etc.;
VolumeSnapshotLocation (pv data), mainly used to make snapshots for PV, requires the cloud provider to provide plug-ins, which are completely defined by specific fields provided by the provider (such as AWS region, Azure resource group, Portworx snapshot type, etc.). Taking databases and middleware that are most sensitive to data consistency as an example, the open source storage plug-in Carina will soon provide a database-aware velero volume snapshot function, which can realize fast backup and recovery of middleware data.

Velero installation and use

Install the velero client

 $ wget https://mirror.ghproxy.com/https://github.com/vmware-tanzu/velero/releases/download/v1.6.3/velero-v1.6.3-darwin-amd64.tar.gz 
$ tar -zxvf velero-v1.6.3-darwin-amd64.tar.gz && cd velero-v1.6.3-darwin-amd64 
$ mv velero /usr/local/bin && chmod +x /usr/local/bin/velero 
$ velero version

Install minio as a data backup backend

The Minio installation Yaml file is as follows:

 apiVersion: v1 
kind: Namespace 
metadata: 
  name: velero 
--- 
apiVersion: apps/v1 
kind: Deployment 
metadata: 
  namespace: velero 
  name: minio 
  labels: 
    component: minio 
spec: 
  strategy: 
    type: Recreate 
  selector: 
    matchLabels: 
      component: minio 
  template: 
    metadata: 
      labels: 
        component: minio 
    spec: 
      volumes: 
      - name: storage 
        emptyDir: {} 
      - name: config 
        emptyDir: {} 
      containers: 
      - name: minio 
        image: minio/minio:latest 
        imagePullPolicy: IfNotPresent 
        args: 
        - server 
        - /storage 
        - --config-dir=/config 
        - --console-address=:9001 
        env: 
        - name: MINIO_ACCESS_KEY 
          value: "minio" 
        - name: MINIO_SECRET_KEY 
          value: "minio123" 
        ports: 
        - containerPort: 9000 
        - containerPort: 9001 
        volumeMounts: 
        - name: storage 
          mountPath: "/storage" 
        - name: config 
          mountPath: "/config" 
--- 
apiVersion: v1 
kind: Service 
metadata: 
  namespace: velero 
  name: minio 
  labels: 
    component: minio 
spec: 
  type: NodePort 
  ports: 
    - name: api 
      port: 9000 
      targetPort: 9000 
    - name: console 
      port: 9001 
      targetPort: 9001 
  selector: 
    component: minio 
--- 
apiVersion: batch/v1 
kind: Job 
metadata: 
  namespace: velero 
  name: minio-setup 
  labels: 
    component: minio 
spec: 
  template: 
    metadata: 
      name: minio-setup 
    spec: 
      restartPolicy: OnFailure 
      volumes: 
      - name: config 
        emptyDir: {} 
      containers: 
      - name: mc 
        image: minio/mc:latest 
        imagePullPolicy: IfNotPresent 
        command: 
        - /bin/sh 
        - -c 
        - "mc --config-dir=/config config host add velero http://minio:9000 minio minio123 && mc --config-dir=/config mb -p velero/velero" 
        volumeMounts: 
        - name: config 
          mountPath: "/config"

Install Mini, and check resource creation.

 $ kubectl apply -f ./00-minio-deployment.yaml 
$ kubectl get pods -n velero 
NAME                     READY   STATUS              RESTARTS   AGE
minio-58dc5cf789-z2777   0/1     ContainerCreating   0          14s
minio-setup-dz4jb        0/1     ContainerCreating   0          6s
$ kubectl get svc  -n velero 
NAME    TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)                         AGE
minio   NodePort   10.96.13.35   <none>        9000:30693/TCP,9001:32351/TCP   17s

After the services have been started, you can log in to minio to check whether the velero/velero bucket has been created successfully.

Install velero server and use s3 as storage

Create minio credentials

 $ cat > credentials-velero <<EOF
[default]
aws_access_key_id = minio
aws_secret_access_key = minio123
EOF
# 安装velero
$ cp velero /usr/bin/
# 启用快速补全
$ velero completion bash

Use the official restic component to back up pv

 $ velero install    \
 --image velero/velero:v1.6.3  \
   --plugins velero/velero-plugin-for-aws:v1.0.0  \
   --provider aws   \
   --bucket velero   \
   --namespace velero  \
   --secret-file ./credentials-velero   \
   --velero-pod-cpu-request 200m   \
   --velero-pod-mem-request 200Mi   \
   --velero-pod-cpu-limit 1000m  \
   --velero-pod-mem-limit 1000Mi   \
   --use-volume-snapshots=false   \
   --use-restic   \
   --restic-pod-cpu-request 200m   \
   --restic-pod-mem-request 200Mi   \
   --restic-pod-cpu-limit 1000m  \
   --restic-pod-mem-limit 1000Mi  \
   --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000

Among them, several important parameters and their descriptions are as follows:

 --provider：声明使用的 Velero 插件类型。
--plugins：使用 S3 API 兼容插件 “velero-plugin-for-aws ”。
--bucket：在腾讯云 COS 创建的存储桶名。
--secret-file：访问 COS 的访问凭证文件，见上面创建的 “credentials-velero”凭证文件。
--use-restic：使用开源免费备份工具 restic 备份和还原持久卷数据。
--default-volumes-to-restic：使用 restic 来备份所有Pod卷，前提是需要开启 --use-restic 参数。
--backup-location-config：备份存储桶访问相关配置。
--region：兼容 S3 API 的 COS 存储桶地区，例如创建地区是广州的话，region 参数值为“ap-guangzhou”。
--s3ForcePathStyle：使用 S3 文件路径格式。
--s3Url：COS 兼容的 S3 API 访问地址
--use-volume-snapshots=false 来关闭存储卷数据快照备份。

After the installation command is complete, wait for the Velero and restic workloads to be ready and see if the configured storage location is available.

 $ velero backup-location get 

apiVersion: velero.io/v1    
kind: BackupStorageLocation    
metadata:    
  name: default    
  namespace: velero    
spec:    
# 只有 aws gcp azure    
  provider: aws    
  objectStorage:    
    bucket: myBucket    
    prefix: backup    
  config:    
    region: us-west-2        
    profile: "default"    
    s3ForcePathStyle: "false"    
    s3Url: http://minio:9000

So far, velero has been fully deployed.

velero function introduction

Create backup

velero supports backing up all objects, as well as filtering objects by type, namespace and/or tag

 $ velero create backup $NAME [flags]
$ velero backup create pvc-backup-1  --snapshot-volumes --include-namespaces nginx-example --default-volumes-to-restic --volume-snapshot-locations default

in:

--include-namespaces: Backup all resources under this namespace, excluding cluster resources

--include-resources: Resource types to backup

--include-cluster-resources: Whether to backup cluster resources This option can have three possible values:

 true：包括所有集群范围的资源；
false：不包括集群范围内的资源；
nil （“自动”或不提供）

--selector: select matching resource backup by tag

--exclude-namespaces: resources under this namespace will not be backed up during backup

--exclude-resources: resources of this type are not backed up when backing up

--velero.io/exclude-from-backup=true: When the tag selector matches the resource, if the resource has this tag, it will not be backed up

At the same time, you can also back up a specific type of resources in a specific order by using the --ordered-resources parameter. You need to specify the resource name and a list of object names for the resource. The resource object names are separated by commas, and the name format is "namespace/resource name" ", for cluster-wide resources, just use the resource name. Key-value pairs in the map are separated by semicolons, and resource types are plural.

 $ velero backup create backupName --include-cluster-resources=true --ordered-resources 'pods=ns1/pod1,ns1/pod2;persistentvolumes=pv4,pv8' --include-namespaces=ns1

$ velero backup create backupName --ordered-resources 'statefulsets=ns1/sts1,ns1/sts0' --include-namespaces=n

Regular backup:

 $ velero schedule create <SCHEDULE NAME> --schedule "0 7 * * *"

$ velero create schedule NAME --schedule="@every 6h"

$ velero create schedule NAME --schedule="@every 24h" --include-namespaces web

$ velero create schedule NAME --schedule="@every 168h" --ttl 2160h0m0s

Backup advanced usage example

Create snapshots of more than one type of persistent volume in a single Velero backup

 $ velero snapshot-location create ebs-us-east-1 \
  --provider aws \
  --config region=us-east-1

$ velero snapshot-location create portworx-cloud \
  --provider portworx \
  --config type=cloud

$ velero backup create full-cluster-backup \
  --volume-snapshot-locations ebs-us-east-1,portworx-cloud

Store backups in different object storage buckets in different regions

 $ velero backup-location create default \
  --provider aws \
  --bucket velero-backups \
  --config region=us-east-1

$ velero backup-location create s3-alt-region \
  --provider aws \
  --bucket velero-backups-alt \
  --config region=us-west-1

$ velero backup create full-cluster-alternate-location-backup \
  --storage-location s3-alt-region

For storage volumes provided by the public cloud, some snapshots are stored locally and some are stored in the public cloud

 $ velero snapshot-location create portworx-local \
  --provider portworx \
  --config type=local

$ velero snapshot-location create portworx-cloud \
  --provider portworx \
  --config type=cloud

$ velero backup create cloud-snapshot-backup \
  --volume-snapshot-locations portworx-cloud

Use storage location

 $ velero backup-location create default \
  --provider aws \
  --bucket velero-backups \
  --config region=us-west-1

$ velero snapshot-location create ebs-us-west-1 \
  --provider aws \
  --config region=us-west-1

$ velero backup create full-cluster-backup

View backup tasks.

When the backup task status is "Completed" and the number of errors is 0, it means that the backup task is completed without any errors. You can run the following command to query:

 $ velero backup get

By first temporarily updating the backup storage location to read-only mode, you can prevent backup objects from being created or deleted in the backup storage location during the restore process.

 $ kubectl patch backupstoragelocation default --namespace velero \
    --type merge \
    --patch '{"spec":{"accessMode":"ReadOnly"}}'
    
velero backup-location get
NAME      PROVIDER   BUCKET/PREFIX   PHASE     LAST VALIDATED   ACCESS MODE   DEFAULT
default   aws        velero          Unknown   Unknown          ReadWrite     true

Restoring backup data

 $ velero restore create --from-backup <backup-name>
$ velero  restore create --from-backup pvc-backup-1 --restore-volumes

View recovery tasks.

 $ velero restore get

After the restore is complete, don't forget to restore the backup storage location to read-write mode for the next backup task to use:

 $ kubectl patch backupstoragelocation default --namespace velero \
   --type merge \
   --patch '{"spec":{"accessMode":"ReadWrite"}}'

Introduction to backup hooks

Velero supports executing some pre-set commands in the container before and after the backup task is executed, which is very effective for data consistency and so on. Velero supports two ways to specify hooks, one is the annotation declaration of the pod itself, and the other is the declaration in the Spec when defining the Backup task.

Pre hooks

 pre.hook.backup.velero.io/container:将要执行命令的容器，默认为pod中的第一个容器,可选的。

pre.hook.backup.velero.io/command:要执行的命令,如果需要多个参数，请将该命令指定为JSON数组。例如：["/usr/bin/uname", "-a"]

pre.hook.backup.velero.io/on-error:如果命令返回非零退出代码如何处理。默认为“Fail”，有效值为“Fail”和“Continue”，可选的。

pre.hook.backup.velero.io/timeout:等待命令执行的时间，如果命令超过超时，则认为该挂钩失败的。默认为30秒，可选的。

Post hooks

 post.hook.backup.velero.io/container:将要执行命令的容器，默认为pod中的第一个容器,可选的。

post.hook.backup.velero.io/command:要执行的命令,如果需要多个参数，请将该命令指定为JSON数组。例如：["/usr/bin/uname", "-a"]

post.hook.backup.velero.io/on-error:如果命令返回非零退出代码如何处理。默认为“Fail”，有效值为“Fail”和“Continue”，可选的。

post.hook.backup.velero.io/timeout:等待命令执行的时间，如果命令超过超时，则认为该挂钩失败的。默认为30秒，可选的

Introduction to restore hooks

Velero supports restore hooks, custom actions that can be performed before a restore task or after the restore process. There are two forms of definition:

InitContainer Restore Hooks: These will add the init container to the restored pod to perform any necessary setup before the application container of the pod to be restored starts.

 init.hook.restore.velero.io/container-image:要添加的init容器的容器镜像

init.hook.restore.velero.io/container-name:要添加的init容器的名称

init.hook.restore.velero.io/command:将要在初始化容器中执行的任务或命令

For example, before taking a backup, use the following command to add a note to the Pod:

 kubectl annotate pod -n <POD_NAMESPACE> <POD_NAME> \
    init.hook.restore.velero.io/container-name=restore-hook \
    init.hook.restore.velero.io/container-image=alpine:latest \
    init.hook.restore.velero.io/command='["/bin/ash", "-c", "date"]'

Exec Restore Hooks: Can be used to execute custom commands or scripts in the restored Kubernetes pod's container.

 post.hook.restore.velero.io/container:;执行hook的容器名称,默认为第一个容器,可选

post.hook.restore.velero.io/command:将在容器中执行的命令,必填

post.hook.restore.velero.io/on-error:如何处理执行失败,有效值为Fail和Continue,默认为Continue,使用Continue模式，仅记录执行失败;使用Fail模式时，将不会在自行其他的hook，还原的状态将为PartiallyFailed,可选

post.hook.restore.velero.io/exec-timeout:开始执行后要等待多长时间,默认为30秒,可选

post.hook.restore.velero.io/wait-timeout:等待容器准备就绪的时间,该时间应足够长，以使容器能够启动，并

Before making a backup, use the following command to add a note to the Pod

 kubectl annotate pod -n <POD_NAMESPACE> <POD_NAME> \
  post.hook.restore.velero.io/container=postgres \
  post.hook.restore.velero.io/command='["/bin/bash", "-c", "psql < /backup/backup.sql"]' \
  post.hook.restore.velero.io/wait-timeout=5m \
  post.hook.restore.velero.io/exec-timeout=45s \
  post.hook.restore.velero.io/on-error=Continue

Analysis of some key problems of Velero

Can Velero restore resources to a different namespace from where it was backed up?

Yes, it can be specified with the --namespace-mappings parameter:

 velero restore create RESTORE_NAME \
  --from-backup BACKUP_NAME \
  --namespace-mappings old-ns-1:new-ns-1,old-ns-2:new-ns-2

After the restore operation is performed, what should be done with the existing service of NodePort type?

Velero has a parameter that allows the user to decide to keep the original nodePorts.

velero restore create Subcommand with --preserve-nodeports flag protects service nodePorts. This flag is used to preserve the original nodePorts from backup, can be used as --preserve-nodeports or --preserve-nodeports=true
If this flag is given, Velero will not delete nodePorts when restoring the Service, but will try to use the nodePorts written at the time of the backup.

How does Velero implement a consistent backup strategy without affecting the business, and upload the backup data to object storage?

If the consistency of the database is achieved based on velero, you need to use velero's hook to perform the quiesce operation on the database before the backup, and unquiesce after the backup. For the backup itself, you can use restic to copy data (but not snapshots), or use snapshots.

Detailed explanation of kubernetes backup and recovery tool Velero | In-depth understanding of the third issue of Carina series