0

Rancher UI 版本: v2.2.5
Rancher UI可以通过界面对集群进行备份和恢复,这一步操作没有问题。

如何使用一个集群的备份文件恢复到一个全新的集群当中。相当于灾难恢复了。
请教各位大神如何实现灾难恢复?如果Rancher UI 无法实现,有没有其他的灾难恢复方案?
例如RKE,KUBECTL?

rke目前官网提供的备份和恢复的文档说明。但是集群是通过 RANCHER UI 创建的,无法提供cluster.yml文件。所以没法使用rke命令恢复;
我尝试创建了一个cluster.yml文件,文件内容只有nodes的配置信息。这个时候通过rke命令进行回复会提示:

CA Certificate or Key is empty

这个是我的操作步骤:

  1. 使用rancher ui 新增一个 dev 集群。并添加一个node 角色etcd、Control、worker
  2. 使用rancher ui 部署一个nginx服务。
  3. 使用rancher ui 备份当前集群。并拿到/opt/rke/etcd-snapshots目录下的zip备份文件
  4. 删除dev集群,清理dev集群的node节点。
  5. 再次使用rancher ui 创建 dev 集群。
  6. 重新把node节点添加到dev集群。
  7. 复制备份的zip文件到/opt/rke/etcd-snapshots中,并解压备份文件,从解压的目录backup中复制文件到/opt/rke/etcd-snapshots中,并改名backup
  8. 新增一个cluster.yml文件,用于rke命令,内容如下:

    nodes:
    - address: 192.168.1.12
     user: root
     role: [controlplane,worker,etcd]
     port: 722
  9. 使用命令进行恢复
    rke etcd snapshot-restore --name backup
    发生错误:
    CA Certificate or Key is empty

执行日志:

INFO[0000] Restoring etcd snapshot backup               
INFO[0000] Successfully Deployed state file at [./cluster.rkestate] 
INFO[0000] [dialer] Setup tunnel for host [172.31.177.174] 
INFO[0007] [etcd] starting backup server on host [172.31.177.174] 
INFO[0007] [etcd] Successfully started [etcd-Serve-backup] container on host [172.31.177.174] 
INFO[0013] [remove/etcd-Serve-backup] Successfully removed container on host [172.31.177.174] 
INFO[0013] [etcd] Checking if all snapshots are identical 
INFO[0014] [etcd] Successfully started [etcd-checksum-checker] container on host [172.31.177.174] 
INFO[0014] Waiting for [etcd-checksum-checker] container to exit on host [172.31.177.174] 
INFO[0014] Container [etcd-checksum-checker] is still running on host [172.31.177.174] 
INFO[0015] Waiting for [etcd-checksum-checker] container to exit on host [172.31.177.174] 
INFO[0015] [etcd] Checksum of etcd snapshot on host [172.31.177.174] is [f57212dc433cda1ba45f897cf322b144] 
INFO[0015] Cleaning old kubernetes cluster              
INFO[0015] [worker] Tearing down Worker Plane..         
INFO[0015] [remove/kubelet] Successfully removed container on host [172.31.177.174] 
INFO[0015] [remove/kube-proxy] Successfully removed container on host [172.31.177.174] 
INFO[0015] [remove/service-sidekick] Successfully removed container on host [172.31.177.174] 
INFO[0015] [worker] Successfully tore down Worker Plane.. 
INFO[0015] [controlplane] Tearing down the Controller Plane.. 
INFO[0016] [remove/kube-apiserver] Successfully removed container on host [172.31.177.174] 
INFO[0016] [remove/kube-controller-manager] Successfully removed container on host [172.31.177.174] 
INFO[0016] [remove/kube-scheduler] Successfully removed container on host [172.31.177.174] 
INFO[0016] [controlplane] Successfully tore down Controller Plane.. 
INFO[0016] [etcd] Tearing down etcd plane..             
INFO[0016] [remove/etcd] Successfully removed container on host [172.31.177.174] 
INFO[0016] [etcd] Successfully tore down etcd plane..   
INFO[0016] [hosts] Cleaning up host [172.31.177.174]    
INFO[0016] [hosts] Cleaning up host [172.31.177.174]    
INFO[0016] [hosts] Running cleaner container on host [172.31.177.174] 
INFO[0017] [kube-cleaner] Successfully started [kube-cleaner] container on host [172.31.177.174] 
INFO[0017] Waiting for [kube-cleaner] container to exit on host [172.31.177.174] 
INFO[0017] Container [kube-cleaner] is still running on host [172.31.177.174] 
INFO[0018] Waiting for [kube-cleaner] container to exit on host [172.31.177.174] 
INFO[0018] [hosts] Removing cleaner container on host [172.31.177.174] 
INFO[0018] [hosts] Removing dead container logs on host [172.31.177.174] 
INFO[0019] [cleanup] Successfully started [rke-log-cleaner] container on host [172.31.177.174] 
INFO[0019] [remove/rke-log-cleaner] Successfully removed container on host [172.31.177.174] 
INFO[0019] [hosts] Successfully cleaned up host [172.31.177.174] 
INFO[0019] [etcd] Restoring [backup] snapshot on etcd host [172.31.177.174] 
INFO[0020] [etcd] Successfully started [etcd-restore] container on host [172.31.177.174] 
INFO[0020] Waiting for [etcd-restore] container to exit on host [172.31.177.174] 
INFO[0020] Container [etcd-restore] is still running on host [172.31.177.174] 
INFO[0021] Waiting for [etcd-restore] container to exit on host [172.31.177.174] 
INFO[0021] Initiating Kubernetes cluster                
INFO[0021] [certificates] Generating Kubernetes API server aggregation layer requestheader client CA certificates 
INFO[0021] [certificates] Generating admin certificates and kubeconfig 
FATA[0021] CA Certificate or Key is empty
尘丶 83
2019-07-17 提问
1 个回答

撰写答案

推广链接