background
A new node is added to a customer's kubernetes cluster. After the application is deployed on the new node, the application will intermittently unavaliable
, the user access report 503, there is no event message, and the host status is normal.
Troubleshoot
Initially suspected to be a new node problem, no relevant error information was found /var/log/message
dmesg
, and the following logs were found in kubelet
When installing a Kubernetes cluster through rke, you can directly execute the command docker logs -f --tail=30 kubelet
on the node to view the kubelet log
E0602 03:18:27.766726 1301 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: an error on the server ("") has prevented the request from succeeding (get leases.coordination.k8s.io k8s-node-dev-6)
E0602 03:18:34.847254 1301 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSIDriver: an error on the server ("") has prevented the request from succeeding (get csidrivers.storage.k8s.io)
I0602 03:18:39.176996 1301 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
E0602 03:18:43.771023 1301 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: an error on the server ("") has prevented the request from succeeding (get leases.coordination.k8s.io k8s-node-dev-6)
The more attention failed to ensure node lease exists
this error message, it should be is literally unable to register host information, but kubectl get nodes
resulting states are Ready
. I think the application is intermittently unavailable. It is suspected that the node may be unavailable for a short time and then recovered quickly. Therefore, it may be normal when the command is executed. In order to verify the guess, the kubectl get nodes
command has been executed in the background, and finally the NotReady
state is captured.
And also capture the details when it is not available
According to kubelet stopped posting node status
as a keyword search, a stackoverflow . Gao Zan answered suggesting to set the kube-apiserver parameter --http2-max-streams-per-connection
, because the cluster has recently deployed prometheus and added multiple nodes, and the number of requests for apiserver suddenly increased. The number of connections to apiserver may be insufficient. This parameter can increase the number of connections. Since the cluster is installed through rke, you need to change the rke configuration file and re-execute the rke up
command. The changes are as follows
kube-api:
service_node_port_range: "1-65535"
extra_args:
http2-max-streams-per-connection: 1000
After re-executing rke up
, perform a cluster update. After the update is completed, the kubelet is restarted, and the problem is solved
If it is rke installation, you can execute the command docker restart kubelet
on the node to restart kubelet
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。