About the author Wang Hailong, Technical Manager of SUSE Rancher China Community, is responsible for the maintenance and operation of Rancher China Technical Community. He has 8 years of experience in the field of cloud computing, and has experienced the technological transformation from OpenStack to Kubernetes. Whether the underlying operating system Linux, or virtualized KVM or Docker container technology, he has rich operation and maintenance and practical experience.
Notice:
- This guide applies to Rancher versions v2.5 and below, not v2.6
- Please make a backup before operation
foreword
Each downstream user cluster managed by Rancher has a cluster agent, which establishes a tunnel and connects to the corresponding cluster controller in the Rancher server through this tunnel.
Cluster agent, also known as cattle-cluster-agent, is a component running in the downstream user cluster. One of its important functions is to exchange events, Statistics, node information and health status are communicated and reported.
When the IP of the Rancher server changes and the cattle-cluster-agent cannot connect to the Rancher server through the tunnel, you can view the following logs in the cattle-cluster-agent container of the downstream cluster:
time="2022-04-06T03:42:22Z" level=info msg="Connecting to wss://35.183.183.66/v3/connect with token jhh9rx4zmgkrw2mz8mkvsmlnnx6q5jllnqb8jnr2vdxcgglglqbdjz"
time="2022-04-06T03:42:22Z" level=info msg="Connecting to proxy" url="wss://35.183.183.66/v3/connect"
time="2022-04-06T03:42:32Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 35.183.183.66:443: i/o timeout"
time="2022-04-06T03:42:32Z" level=error msg="Remotedialer proxy error" error="dial tcp 35.183.183.66:443: i/o timeout"
35.183.183.66 is the original Rancher server IP
The Rancher UI shows the cluster status as Unavailable :
It can be seen that after the host IP of the Rancher server changes, the Rancher agent cannot connect through the original Rancher server IP, so we need to update the IP address of the Rancher agent to connect to the Rancher server.
Rebuild Rancher agent
Make the Rancher agent connect to the new Rancher server IP
update server-url
Because the IP address of the Rancher server node has changed, the server-url of the Rancher server needs to be updated to the correct host IP. We can find the option for server-url from Settings.
Get the kubeconfig of the downstream cluster
Recreating the Rancher agent requires connecting to the downstream cluster through kubectl, so before operating, first obtain the kubeconfig file of the downstream cluster.
You can choose one of the following three methods:
- If the kubeconfig for the downstream cluster has been downloaded from the Rancher UI. Rancher has lost contact with the downstream cluster, so it is no longer possible to use the rancher api to connect to the downstream cluster. But you can directly connect to the downstream cluster kube-apiserver by switching the context, and continue to operate the downstream cluster, refer to: Authenticate directly with the downstream cluster ( https://rancher.com/docs/rancher/v2.6/en/cluster-admin /cluster-access/kubectl/ )
- Obtained in the secret of the Rancher server container, reference: https://gist.github.com/superseb/f6cd637a7ad556124132ca39961789a4
- Generate kubeconfig on nodes with control plane role:
docker run --rm --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml
Regenerate Rancher agent definitions
Generate API token in UI (User -> API & Keys) and save Bearer Token;
This example is: token-rfv84:86v2wxpzh8mtgvzxpsnwnvrx5nlc424tf8tvrnpzckdxdpt2vfltqq
Find the clusterid (in the format c-xxxxx) in the Rancher UI. If you don't know how to find the clusterid, you can navigate to the home page and click the corresponding cluster name. At this time, the address bar of the browser will display a clusterid of c-xxxxx.
This example is: c-s8t7s
Generate agent definition (requires curl, jq)
# Rancher URL
RANCHERURL="https://35.183.24.89"
# Cluster ID
CLUSTERID="c-s8t7s"
# Token
TOKEN="token-rfv84:86v2wxpzh8mtgvzxpsnwnvrx5nlc424tf8tvrnpzckdxdpt2vfltqq"
# Valid certificates
curl -s -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .command'
# Self signed certificates
curl -s -k -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .insecureCommand'
Upon successful execution, an execution-defined command will be generated, such as:
root@ip-172-31-6-210:~# curl -s -k -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .insecureCommand'
curl --insecure -sfL https://35.183.24.89/v3/import/98bvp7cpc7m7xqccxqwsghbnb6pvm9b2lcz7jz4xlfdlsc9lh5tmv8_c-s8t7s.yaml | kubectl apply -f -
Application Definition
Execute the command generated in the previous step to reconfigure the Rancher agent on the host with kubectl and kubeconfig:
root@ip-172-31-6-210:~# curl --insecure -sfL https://35.183.24.89/v3/import/98bvp7cpc7m7xqccxqwsghbnb6pvm9b2lcz7jz4xlfdlsc9lh5tmv8_c-s8t7s.yaml | kubectl apply -f -
clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver unchanged
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master unchanged
namespace/cattle-system unchanged
serviceaccount/cattle unchanged
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding unchanged
secret/cattle-credentials-6f51cbe created
clusterrole.rbac.authorization.k8s.io/cattle-admin unchanged
deployment.apps/cattle-cluster-agent configured
daemonset.apps/cattle-node-agent configured
verify
After a few moments, the cattle-cluster-agent and cattle-node-agent will re-run:
root@ip-172-31-6-210:~# kubectl -n cattle-system get pods
NAME READY STATUS RESTARTS AGE
cattle-cluster-agent-77f864c76f-qrjs2 1/1 Running 0 38s
cattle-node-agent-znrv5 1/1 Running 0 4s
The business cluster status becomes Active again:
Postscript
It is highly not recommended to modify the IP address of the Rancher server, and even modifying the server-url may bring hidden dangers.
Even for the Rancher server installed on a single node, it is recommended to register the downstream cluster through the domain name, so that the subsequent migration from a single node to high availability can be performed; or after the Rancher server node IP changes, it is only necessary to modify the corresponding IP mapping.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。