Solve the problem of intermittent unavailability of services for operation and maintenance

background

A new node is added to a customer's kubernetes cluster. After the application is deployed on the new node, the application will intermittently unavaliable , the user access report 503, there is no event message, and the host status is normal.

Troubleshoot

Initially suspected to be a new node problem, no relevant error information was found /var/log/message dmesg , and the following logs were found in kubelet

When installing a Kubernetes cluster through rke, you can directly execute the command docker logs -f --tail=30 kubelet on the node to view the kubelet log

E0602 03:18:27.766726    1301 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: an error on the server ("") has prevented the request from succeeding (get leases.coordination.k8s.io k8s-node-dev-6)
E0602 03:18:34.847254    1301 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSIDriver: an error on the server ("") has prevented the request from succeeding (get csidrivers.storage.k8s.io)
I0602 03:18:39.176996    1301 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
E0602 03:18:43.771023    1301 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: an error on the server ("") has prevented the request from succeeding (get leases.coordination.k8s.io k8s-node-dev-6)

The more attention failed to ensure node lease exists this error message, it should be is literally unable to register host information, but kubectl get nodes resulting states are Ready . I think the application is intermittently unavailable. It is suspected that the node may be unavailable for a short time and then recovered quickly. Therefore, it may be normal when the command is executed. In order to verify the guess, the kubectl get nodes command has been executed in the background, and finally the NotReady state is captured.

And also capture the details when it is not available

According to kubelet stopped posting node status as a keyword search, a stackoverflow . Gao Zan answered suggesting to set the kube-apiserver parameter --http2-max-streams-per-connection , because the cluster has recently deployed prometheus and added multiple nodes, and the number of requests for apiserver suddenly increased. The number of connections to apiserver may be insufficient. This parameter can increase the number of connections. Since the cluster is installed through rke, you need to change the rke configuration file and re-execute the rke up command. The changes are as follows

  kube-api:
    service_node_port_range: "1-65535"
    extra_args:
      http2-max-streams-per-connection: 1000

After re-executing rke up , perform a cluster update. After the update is completed, the kubelet is restarted, and the problem is solved

If it is rke installation, you can execute the command docker restart kubelet on the node to restart kubelet

Solve the problem of intermittent unavailability of services for operation and maintenance

background

Troubleshoot

DQuery

引用和评论

SourceTree自定义操作的一个应用

Jenkins 企业级 CI/CD 实践：安装、配置与 Kubernetes & Docker 集成

k8s集群部署（一主两从）

k8s实战基础

使用kubeadm部署高可用IPV4/IPV6集群---V1.32

centos7使用yum网络安装

基于k3s部署Nginx、MySQL、PHP和Redis的详细教程