Author: Zheng Zengquan
Aikesheng South District database engineer, a member of the Aikesheng DBA team, responsible for database-related technical support. Hobbies: billiards, badminton, coffee, movies
Source of this article: original submission
* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.
1. Summary of this article and main terms
1.1 Overview
This article is divided based on the three modules of Pod, Service, and Ingress. For the daily failures that may occur in Kubernetes, it provides more specific troubleshooting steps, and attaches related solutions or references.
1.2 Main terms
- Pod: The smallest deployable computing unit created and managed in Kubernetes. Is a group (one or more) of containers; these containers share storage, network, and statements about how to run these containers.
- Port-forward: Map the local port to the specified application port through port forwarding.
- Service: A Kubernetes Service is an abstraction that defines a logical set of Pods and a strategy for accessing them-sometimes called microservices.
- Ingress: Provides routing communication from the outside of the cluster to the internal HTTP and HTTPS services, and the traffic routing is controlled by the rules defined on the Ingress resource.
Second, the fault diagnosis process
2.1 Pods module check
- If the following process is successful, proceed to the next step; if it fails, proceed to jump according to the prompts.
2.1.1 Check if any pod is in PENDING state
- kubectl get pods: If there are pods in the PENDING state, look down, otherwise go to 2.1.5.
[root@10-186-65-37 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
myapp-deploy-55b54d55b8-5msx8 0/1 Pending 0 5m
- kubectl describe pod <pod-name>: If the detailed information of the specified one or more resources is output correctly, it is judged whether the cluster resources are insufficient, if insufficient, then expand, otherwise go to 2.1.2.
2.1.2 Check whether the ResourceQuota limit is triggered
- kubectl describe resourcequota -n <namespace>:
[root@10-186-65-37 ~]# kubectl describe quota compute-resources --namespace=myspace
Name: compute-resources
Namespace: myspace
Resource Used Hard
-------- ---- ----
limits.cpu 0 2
limits.memory 0 2Gi
pods 0 4
requests.cpu 0 1
requests.memory 0 1Gi
- If there are restrictions, release or expand the corresponding resources, refer to:
https://kubernetes.io/zh/docs/concepts/configuration/manage-resources-containers/#extended-resources - Otherwise go to 2.1.3
2.1.3 Check whether any PVC is in PENDING state
- Persistent Volume (PV) is a piece of storage in the cluster, which can be provisioned by the administrator in advance, or dynamically provisioned by the storage class (Storage Class); Persistent Volume Claim (PVC) expresses the user's request for storage .
kubectl describe pvc <pvc-name>:
If STATUS is Pending[root@10-186-65-37 k8s-file]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE local-device-pvc Pending local-device 72s
Then refer to the following link to solve:
https://kubernetes.io/zh/docs/concepts/storage/persistent-volumes/
- Otherwise go to 2.1.4
2.1.4 Check whether the pod is assigned to the node
- kubectl get pods -o wide:
If it has been assigned to node
[root@10-186-65-37 ~]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myapp-deploy-55b54d55b8-5msx8 1/1 Running 0 14d 10.244.4.9 10-186-65-122 <none> <none>
myapp-deploy-55b54d55b8-7ldj4 1/1 Running 0 14d 10.244.2.10 10-186-65-126 <none> <none>
myapp-deploy-55b54d55b8-cwdwt 1/1 Running 0 14d 10.244.3.9 10-186-65-126 <none> <none>
myapp-deploy-55b54d55b8-gvmb9 1/1 Running 0 14d 10.244.4.10 10-186-65-122 <none> <none>
myapp-deploy-55b54d55b8-xbqb6 1/1 Running 0 14d 10.244.5.9 10-186-65-118 <none> <none>
It is a problem with the Scheduler, please refer to the following link to solve it:
https://kubernetes.io/zh/docs/concepts/scheduling-eviction/kube-scheduler/
- Otherwise, it is a problem with Kubectl.
2.1.5 Check whether pods are in the RUNNING state
- kubectl get pods -o wide:
If the pods are in the RUNNING state, go to 2.1.10, otherwise go to 2.1.6.
2.1.6 Check pod log
- kubectl logs <pod-name>:
If the log can be obtained correctly, fix the related problems according to the log.
[root@10-186-65-37 ~]# kubectl logs myapp-deploy-55b54d55b8-5msx8
127.0.0.1 - - [30/Sep/2021:06:53:16 +0000] "GET / HTTP/1.1" 200 65 "-" "curl/7.29.0" "-"
127.0.0.1 - - [30/Sep/2021:07:49:44 +0000] "GET / HTTP/1.1" 200 65 "-" "curl/7.29.0" "-"
127.0.0.1 - - [30/Sep/2021:07:51:09 +0000] "GET / HTTP/1.1" 200 65 "-" "curl/7.29.0" "-"
127.0.0.1 - - [30/Sep/2021:07:57:00 +0000] "GET / HTTP/1.1" 200 65 "-" "curl/7.29.0" "-"
127.0.0.1 - - [30/Sep/2021:08:03:56 +0000] "GET / HTTP/1.1" 200 65 "-" "curl/7.29.0" "-"
- If the log cannot be obtained, judge whether the container quickly stops running, if it stops quickly, execute:
kubectl logs <pod-name> --previous - Unable to get the log, and the container does not stop running quickly, go to 2.1.7.
2.1.7 Whether the Pod status is ImagePullBackOff
- kubectl describe pod <pod-name>:
Check whether the status is ImagePullBackOff? If it is not ImagePullBackOff, go to 2.1.8. - Check whether the image name is correct, and correct the error.
- Check whether the image tag exists and has been verified.
- Do you pull the image from the private registry? If so, confirm that the configuration information is correct.
- If the image is not pulled from the private registry, the problem may be CRI (Container Runtime Interface) or kubectl.
2.1.8 Whether the Pod status is CrashLoopBackOff
- kubectl describe pod <pod-name>:
Check whether the status is CrashLoopBackOff? Otherwise, go to 2.1.9. - If so, check the log and fix the application crash.
- Are there any missing CMD instructions in the Dockerfile?
Docker history <image-id> (after adding --no-trunc to display the complete output)
[root@10-186-65-37 ~]# docker history fb4cca6b4e4c
IMAGE CREATED CREATED BY SIZE COMMENT
fb4cca6b4e4c 22 months ago /bin/sh -c #(nop) COPY file:957630e64c05c549… 121MB
<missing> 2 years ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 2 years ago /bin/sh -c #(nop) ADD file:1d711f09b1bbc7c8d… 42.3MB
- Does the Pod status restart frequently and the status is switched between Running and CrashLoopBackOff? If so, you need to fix the liveness probe (liveness probe) problem, please refer to the following link:
https://kubernetes.io/zh/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
2.1.9 Whether the Pod state is in RunContainerError
- kubectl describe pod <pod-name>:
Check whether the status is RunContainerError. - If the status is RunContainerError, the problem may be caused by mounting the volume (volume), please refer to the following link:
https://kubernetes.io/zh/docs/concepts/storage/volumes/ - Otherwise, please ask for help on sites such as StackOverflow.
2.1.10 Check if pods are in READY state
If it is in the READY state, continue to perform the mapping setting
[root@10-186-65-37 ~]# kubectl get pods NAME READY STATUS RESTARTS AGE myapp-deploy-55b54d55b8-5msx8 1/1 Running 0 14d myapp-deploy-55b54d55b8-7ldj4 1/1 Running 0 14d
If there are no pods in the READY state, go to 2.1.11.
- kubectl port-forward <pod-name> 8080:<pod-port>
- Successfully mapped to 2.2
a) Mapping
[root@10-186-65-37 ~]# kubectl port-forward myapp-deploy-55b54d55b8-5msx8 8080:80
Forwarding from 127.0.0.1:8080 -> 80
Forwarding from [::1]:8080 -> 80
b) Verify that the mapping is successful
[root@10-186-65-37 ~]# curl localhost:8080
Hello MyApp | Version: v2 | <a href="hostname.html">Pod Name</a>
- If it fails, you need to confirm that the program can be monitored by all addresses. The setting statement is as follows:
kubectl port-forward --address 0.0.0.0 <pod-name> 8080:<pod-port>
If it cannot be monitored by all addresses, it is in the Unknown state.
2.1.11 Check Readiness (ready detector)
- kubectl describe pod <pod-name>
- For normal output, fix the corresponding problem according to the log and refer to the following link
https://kubernetes.io/zh/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ - Failure is unknown state (Unknown state).
2.2 Service module check
2.2.1 Service current status check
- kubectl describe service <service-name>
The successful output is as follows:
[root@10-186-65-37 ~]# kubectl describe service myapp
\Name: myapp
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"myapp","namespace":"default"},"spec":{"ports":[{"name":"http","po...
Selector: app=myapp,release=canary
Type: ClusterIP
IP: 10.96.109.76
Port: http 80/TCP
TargetPort: 80/TCP
Endpoints: 10.244.2.10:80,10.244.3.9:80,10.244.4.10:80 + 2 more...
Session Affinity: None
Events: <none>
- Can you see the Endpoints column and there is normal output? For abnormal output, go to 2.2.2.
kubectl port-forward service/<service-name> 8080:<service-port>
The successful output is as follows:[root@10-186-65-37 ~]# kubectl port-forward service/myapp 8080:80 Forwarding from 127.0.0.1:8080 -> 80 Forwarding from [::1]:8080 -> 80
- Go to 2.3 for success and 2.2.4 for failure.
2.2.2 Selector and Pod label comparison
- View the label information of the pod
kubectl describe pod <pod-name>
[root@10-186-65-37 ~]# kubectl describe pod myapp-deploy-55b54d55b8-5msx8 | grep -i label -A 2
Labels: app=myapp
pod-template-hash=55b54d55b8
release=canary
View the selector information of the service
kubectl describe service <service-name>[root@10-186-65-37 ~]# kubectl describe service myapp | grep -i selector Selector: app=myapp,release=canary
- Check whether the two match correctly, correct the error, and go to 2.2.3 if it is correct.
2.2.3 Check whether the Pod has been assigned an IP
- View pod's ip information
kubectl describe pod <pod-name> The ip has been allocated correctly, the problem is caused by kubectl.
[root@10-186-65-37 ~]# kubectl describe pod myapp-deploy-55b54d55b8-5msx8 | grep -i 'ip' IP: 10.244.4.9 IPs: IP: 10.244.4.9
- If the ip is not assigned, the problem is caused by the Controller manager.
2.2.4 Check Service TargetPort and Pod ContainerPort
- View the TargetPort information of the service:
kubectl describe service <service-name>
[root@10-186-65-37 ~]# kubectl describe service myapp | grep -i targetport
TargetPort: 80/TCP
View the ContainerPort information of the pod:
kubectl describe pod < pod-name >[root@10-186-65-37 ~]# kubectl describe pod myapp-deploy-55b54d55b8-5msx8 | grep -i port Port: 80/TCP Host Port: 0/TCP
- If the above two are consistent, the problem is caused by kube-proxy, and if they are inconsistent, the information will be corrected.
2.3 Ingress module check
2.3.1 Ingress current status check
kubectl describe ingress <ingress-name>
The successful output is as follows:[root@10-186-65-37 ~]# kubectl describe ingress ingress-tomcat-tls Name: ingress-tomcat-tls Namespace: default Address: Default backend: default-http-backend:80 (<none>) TLS: tomcat-ingress-secret terminates tomcat.quan.com Rules: Host Path Backends ---- ---- -------- tomcat.quan.com tomcat:8080 (10.244.2.11:8080,10.244.4.11:8080,10.244.5.10:8080) Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"kubernets.io/ingress.class":"nginx"},"name":"ingress-tomcat-tls","namespace":"default"},"spec":{"rules":[{"host":"tomcat.quan.com","http":{"paths":[{"backend":{"serviceName":"tomcat","servicePort":8080},"path":null}]}}],"tls":[{"hosts":["tomcat.quan.com"],"secretName":"tomcat-ingress-secret"}]}} kubernets.io/ingress.class: nginx Events: <none>
- Can you see the backends column and there is normal output? Normal output goes to 2.3.4, otherwise goes to 2.3.2.
2.3.2 Check ServiceName and ServicePort
- kubectl describe ingress <ingress-name>
- kubectl describe service <service-name>
[root@10-186-65-37 ~]# kubectl describe ingress ingress-tomcat-tls | grep -E 'serviceName|servicePort'
kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"kubernets.io/ingress.class":"nginx"},"name":"ingress-tomcat-tls","namespace":"default"},"spec":{"rules":[{"host":"tomcat.quan.com","http":{"paths":[{"backend":{"serviceName":"tomcat","servicePort":8080},"path":null}]}}],"tls":[{"hosts":["tomcat.quan.com"],"secretName":"tomcat-ingress-secret"}]}}
- Check if the ServiceName and ServicePort of the first two are written correctly, if they are correct, go to 2.3.3, please correct the errors.
2.3.3 Ingress controller document
- The problem is caused by the Ingress controller, please refer to the documentation for a solution:
https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/
2.3.4 Check port-forward ingress
1.kubectl port-forward <ingress-pod-name> 8080:<ingress-port>
Test whether it can be accessed normally: curl localhost:8080
You can go to 2.3.5 for normal access, otherwise go to 2.3.3.
2.3.5 Check whether you can access through Ingress on the external network
- It can be successfully accessed from the external network, and the troubleshooting is over.
- If you cannot access from the external network, the problem is caused by the infrastructure or the exposed method of the cluster. Please troubleshoot.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。