Author: Zhang Yanying (Old Z), operation and maintenance architect of Shandong Branch of Telecom System Integration Company, cloud native enthusiast, currently focusing on cloud native operation and maintenance.
1. Introduction to this article
This article originates from the Ectd monitoring mentioned by a small partner @Jam in the KubeSphere open source community 8 group. I hope I can help you to take a look. Originally, I didn't enable Etcd monitoring, but since my friends trusted me so much and made a request, it must be arranged. Hence this article.
After research, it is found that there is an Etcd monitoring page displayed in the cluster status monitoring that comes with KubeSphere. However, in KubeSphere 3.2.1 version, after the default configuration enables Etcd monitoring, the Etcd monitoring page in the cluster status does not have any data. This article will document the troubleshooting journey to resolve this issue.
Knowledge points of this article
- Rating: entry level
- Prometheus-Operator
- KubeSphere enables Etcd monitoring
Demo server configuration
CPU name | IP | CPU | RAM | system disk | data disk | use |
---|---|---|---|---|---|---|
zdeops-master | 192.168.9.9 | 2 | 4 | 40 | 200 | Ansible operation and maintenance control node |
ks-k8s-master-0 | 192.168.9.91 | 8 | 32 | 40 | 200 | KubeSphere/k8s-master/k8s-worker |
ks-k8s-master-1 | 192.168.9.92 | 8 | 32 | 40 | 200 | KubeSphere/k8s-master/k8s-worker |
ks-k8s-master-2 | 192.168.9.93 | 8 | 32 | 40 | 200 | KubeSphere/k8s-master/k8s-worker |
glusterfs-node-0 | 192.168.9.95 | 4 | 8 | 40 | 200 | GlusterFS/ElasticSearch |
glusterfs-node-1 | 192.168.9.96 | 4 | 8 | 40 | 200 | GlusterFS/ElasticSearch |
glusterfs-node-2 | 192.168.9.97 | 4 | 8 | 40 | 200 | GlusterFS/ElasticSearch |
2. KubeSphere CRD enables Etcd monitoring
Edit the YAML configuration file for ks-installer in the CRD .
In the YAML file, search for etcd and change monitoring from false to true.
etcd: endpointIps: '192.168.9.91,192.168.9.92,192.168.9.93' monitoring: true port: 2379 tlsEnable: true
- After all configurations are complete, click OK in the lower right corner to save the configuration.
Execute the following command in kubectl to check the installation process.
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f
Results are not displayed.
Verify the installation results.
Log in to the console, Platform Management- > Cluster Management- > Monitoring Alarms- > Cluster Status , and check whether the etcd monitoring tab exists. If it exists, the monitoring is successfully enabled.
Although the previous configuration is enabled, the monitoring data does not exist at this time. At the same time, checking the Pod of prometheus-k8s will find the following error.
- Next, we will explain why and how to configure it.
3. Problem solving process record
Find the official forum, the keyword uses etcd to find the following document that looks relatively close, open it and take a look.
etcd uses a self-signed certificate, prometheus reports an error issued by an unknown authority #2.11
However, there is no detailed problem solving process in the document. I am confused, but I have obtained very important configuration steps.
According to the key point 1 obtained above, generate the secret with the certificate of the external etcd .
This command is to generate a secret configuration based on etcd's cert.
# kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs --from-file=etcd-client-ca.crt=/etc/ssl/etcd/ssl/ca.pem --from-file=etcd-client.crt=/etc/ssl/etcd/ssl/admin-i-ezjb7gsk.pem --from-file=etcd-client.key=/etc/ssl/etcd/ssl/admin-i-ezjb7gsk-key.pem
Don't worry, first see if the secret exists, and if not, generate it according to the command.
[root@ks-k8s-master-0 ~]# kubectl get secrets -n kubesphere-monitoring-system NAME TYPE DATA AGE additional-scrape-configs Opaque 1 9d alertmanager-main Opaque 1 9d alertmanager-main-generated Opaque 1 9d alertmanager-main-tls-assets Opaque 0 9d alertmanager-main-token-7b9xc kubernetes.io/service-account-token 3 9d default-token-tnxh7 kubernetes.io/service-account-token 3 9d kube-etcd-client-certs Opaque 3 9d kube-state-metrics-token-czbrg kubernetes.io/service-account-token 3 9d node-exporter-token-qrhl7 kubernetes.io/service-account-token 3 9d notification-manager-sa-token-lc6z4 kubernetes.io/service-account-token 3 9d notification-manager-webhook-server-cert kubernetes.io/tls 2 9d prometheus-k8s Opaque 1 9d prometheus-k8s-tls-assets Opaque 0 9d prometheus-k8s-token-7fk45 kubernetes.io/service-account-token 3 9d prometheus-operator-token-wlmcf kubernetes.io/service-account-token 3 9d sh.helm.release.v1.notification-manager.v1 helm.sh/release.v1 1 9d
Actually found kube-etcd-client-certs .
Looking at the specific content, I found that there are all of them, and there are not many.
[root@ks-k8s-master-0 ~]# kubectl get secrets -n kubesphere-monitoring-system kube-etcd-client-certs -o yaml apiVersion: v1 data: etcd-client-ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM5VENDQWQyZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFTTVJBd0RnWURWUVFERXdkbGRHTmsKTFdOaE1CNFhEVEl5TURRd09URTBNekl5TjFvWERUTXlNRFF3TmpFME16SXlOMW93RWpFUU1BNEdBMVVFQXhNSApaWFJqWkMxallUQ0NBU0l3RFFZSktvWklodmNOQVFFQkJRQURnZ0VQQURDQ0FRb0NnZ0VCQU53SnpobDFPSVpyCkZYOUNsbER3czVVdnA5NkxHOHpxWkZGbmRGZVBlb1RrTXlFSVpESFRQM0lYSFhzaFFPNjF3VlpVd3VvMmJoeTcKdTBLbEFUcXZmZ1ZJTWE2MlpKTFVNcGwrendvMnFDcWpzbHd1b3RacHArTHVYaldYRTFOeWcwWi9MRmd3NDArOQpGSDV3Y2VWK0FhNjhETElKQWw4a0l6VktScVgraENjZGVTOFRWbDNVeS9PMWRkRFJGODExYzB6VTNteEF2Z0h5CmlxOFF0S2dBQ3E0L294N3RPRFRZUVNlVVdOa25tZTBLMituWmR6M1RveHpUamdIZ2FDVlFXVW5nNFNyMVlSYWwKV2owTGlET2tWb2l3TlFrSVd6ZnBrVXUrM2RJUGNPL29Wc0E3eEJLenhGdEp2dmthTGU1ZDd6a3p2d2xVdE1NYgp2NzNzNERqNU0yc0NBd0VBQWFOV01GUXdEZ1lEVlIwUEFRSC9CQVFEQWdLa01BOEdBMVVkRXdFQi93UUZNQU1CCkFmOHdIUVlEVlIwT0JCWUVGREh3WUNYcW90OG9oYWNZa1FBaHMrRjNSWW5tTUJJR0ExVWRFUVFMTUFtQ0IyVjAKWTJRdFkyRXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBS3l3SEJpVEkxYjExQjNrTDJNZFN0WGRaZ2ZNT05obApuZ1QyUjVuQWZISUVTZVRGNnpFbWh6QnBRb3ozMm1GbG1VdlRKMjdhdVk4UGh2cC9pT0pKbWZIZnY3RWcyYVpJCmlkK2w5YTJoQXFrMnVnNmV4NFpjUzgvOUxyTUV3SlhDOGZqeTA0OWdLQjIyMXFuSFh0Q3VyNE95MUFyMHBiUUwKaEQ4T0lpaExBbHpZNnIvQTlzVDYrNU12cy80OE5LeWN0Sy9KYzFhbVVQK0tnWXlPWDNWNXVsM096MFpIT2ptRAo5akIrdlNHUHM5REdrdnJEeFp4SDRIM0NhaTF5cHBlc29YVFZndS81UTFjcVlvdGNJalZpekx5eVNjZ1EzQ2ZqCmVvdnk3NW8vZUdiRmpYSmJQV0NncDhYV2RJWkVmcmNXMXZtWjZPZDVmcXIwblY5QVExekhueWs9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K etcd-client.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUQrVENDQXVHZ0F3SUJBZ0lJT2Y3Ky90T3NYa013RFFZSktvWklodmNOQVFFTEJRQXdFakVRTUE0R0ExVUUKQXhNSFpYUmpaQzFqWVRBZUZ3MHlNakEwTURreE5ETXlNamRhRncwek1qQTBNRFl4TkRNeU16QmFNQ1F4SWpBZwpCZ05WQkFNVEdXVjBZMlF0Ym05a1pTMXJjeTFyT0hNdGJXRnpkR1Z5TFRBd2dnRWlNQTBHQ1NxR1NJYjNEUUVCCkFRVUFBNElCRHdBd2dnRUtBb0lCQVFDN0NvS1dWKzJKeXRVRTc2VnhvU3lOZzZXOU4yRUlxaTA5UkQ3TThTYUMKZzNHSFZJcXRjWUZzWEhNSHNGeGkyc0ltRWdTblRQMU1sS2Y2Q2xoZ1llSUJqbHJjdWVGNzNDUW45dkw3bXdqMwpJVzV0cUJ4Z1BwRmpvc1FQcGs5eU5XWmpEVGJsbHJTbkZjTXNKekFEOXNIZjdiRWUrQTZJcnJDUnhLZGJWaVY1CnFveFR5THhJenF4c2NDMlMwclJCYk5YbHAzZFU1QStldGZhOUYxUFNCeDQxdmk1MXcvTnBVRkNOa2ZuaWhyZnUKcUVoYW0zNUdCbFYrRzd4ZENSVGt6K3h3V3IwdnhMUitueGZ5MElHL2hyYlIxL0RLbHo5Y3BnbHhTWUg5S3ZvbgpzVXRpemhQYXVsRFZIN2NFdTJGOWZuTHZlK2hZemt3c3hhS1RsQTFlQ2VEeEFnTUJBQUdqZ2dFL01JSUJPekFPCkJnTlZIUThCQWY4RUJBTUNCYUF3SFFZRFZSMGxCQll3RkFZSUt3WUJCUVVIQXdFR0NDc0dBUVVGQndNQ01Bd0cKQTFVZEV3RUIvd1FDTUFBd0h3WURWUjBqQkJnd0ZvQVVNZkJnSmVxaTN5aUZweGlSQUNHejRYZEZpZVl3Z2RvRwpBMVVkRVFTQjBqQ0J6NElFWlhSalpJSVFaWFJqWkM1cmRXSmxMWE41YzNSbGJZSVVaWFJqWkM1cmRXSmxMWE41CmMzUmxiUzV6ZG1PQ0ltVjBZMlF1YTNWaVpTMXplWE4wWlcwdWMzWmpMbU5zZFhOMFpYSXViRzlqWVd5Q0QydHoKTFdzNGN5MXRZWE4wWlhJdE1JSVBhM010YXpoekxXMWhjM1JsY2kweGdnOXJjeTFyT0hNdGJXRnpkR1Z5TFRLQwpFMnhpTG10MVltVnpjR2hsY21VdWJHOWpZV3lDQ1d4dlkyRnNhRzl6ZEljRWZ3QUFBWWNRQUFBQUFBQUFBQUFBCkFBQUFBQUFBQVljRXdLZ0pXNGNFd0tnSlhJY0V3S2dKWFRBTkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQXZOR2gKdHdlTG1QS2F2YjVhOFoxU2sxQkFZdzZ6dEdHTnJGdzg2M1dKRVBEblFFa3duOFhJNGh4SU82UVV3eHJic1MweAp0YUg2ZmRKeFZZcEN5UXVrV3JldHpkZ05zMTVWYnlNdUlqVkJRMytGZnBRaDB5T25tUXlmRWc2UWZNdU5IWGpJCjZCdVp5M0p0S0tFZGZmUFh4U3VlMFV2TG5idlN6U0tVQkRIcy9nNVV0Q3cyeHVIVFU5bFdoQXY2dm1WQ08yQW4KZmc2MjAzMUpUNG9ya2F6c1hmdENOTlZqUmdIZ2pjQ0NDZkMwY1hSRVZTVFZqZUFaZU40ZUdtYWlRcFdEUWkxbApUVWZJMlE0dGRySlFsOXk0dDNKRDgrSmFLT0VJWkt3NWVWaTc3cUZobWR1MmFkRThkODc0aVBnN2ZEYmVFS2tWCkYxVWVKb3NKOFN3Z1psWTRpQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K etcd-client.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb3dJQkFBS0NBUUVBdXdxQ2xsZnRpY3JWQk8rbGNhRXNqWU9sdlRkaENLb3RQVVErelBFbWdvTnhoMVNLCnJYR0JiRnh6QjdCY1l0ckNKaElFcDB6OVRKU24rZ3BZWUdIaUFZNWEzTG5oZTl3a0ovYnkrNXNJOXlGdWJhZ2MKWUQ2Ulk2TEVENlpQY2pWbVl3MDI1WmEwcHhYRExDY3dBL2JCMysyeEh2Z09pSzZ3a2NTblcxWWxlYXFNVThpOApTTTZzYkhBdGt0SzBRV3pWNWFkM1ZPUVBuclgydlJkVDBnY2VOYjR1ZGNQemFWQlFqWkg1NG9hMzdxaElXcHQrClJnWlZmaHU4WFFrVTVNL3NjRnE5TDhTMGZwOFg4dENCdjRhMjBkZnd5cGMvWEtZSmNVbUIvU3I2SjdGTFlzNFQKMnJwUTFSKzNCTHRoZlg1eTczdm9XTTVNTE1XaWs1UU5YZ25nOFFJREFRQUJBb0lCQVFDamQ1c0x4SXNRMjFsegpOL0xUTFhhZnM0ZmRxQkhCSGVIdDRzQTBJeXB4OUdqN1NwTHM1UCtrOGVPQ3U4cnlocGdaNTdOemVDRUVsZ044Cnp4L1FGSndPbWhpbFFqdGtJZERqc0x0SjFJUndZQ0ovNmVYcTQ2UHpmV1IyL1BZQUxkVnZDalNKVVQ1UHJRQm4KalZRMGtxdDhodU0rMnJMeEdDT3ZNanpGNGJOYzhZZGFSOTI0c095Y1Q2UzI1Vzg3TklQWnVqY3VBUXIzaEE2bwpUbEdmVU44Q0hSM21jVnBIbEJ1NDhEeEpYaml2MkVKZTRHSmN2L0NWQTVqVGNNNlNoTjJuSGN3OGpHYVg0bGJtCjJYaktKemE0RStON3hGRXBRVEJRMUNqRGM1cndKY0tKUm9IQkxFUGtJVE5LWnNWSDlmK0tuNmpjQWtmOTZoWVkKKzY1TTMza1ZBb0dCQU5GMVdRNG4wcTE0YlpSY1FkbnFoWDdYT0pFbDBtOUZuYVhOTjNsb0M1SnNneGxkbXh5bgpRV1IvZkJVQnRaTUc5MmgzdTBheWUyaWdZdGtSc1pDV0wwL2VicmJGMWlmYXozR2Z1b3lSZWozMHVsRDJYY3phCmQzSEUwdVpTSVQrUkFSTTF1VjJUczVUSHJqUStIT3Z5cEpFQjFlSnY1L21LWmRpUTRtMzBGMDUzQW9HQkFPU1oKL21NWXd4V1Y4SFRtaENyNGsycDJQd1NLTHVrajhZaVJQZHhVSFpXWXdRTGFFRU1uSVVnUFJBSnFHc1VtWng5TApacDVjYXp3bW9ldDI0cXpGeVhkemFUMi96VGc1Rjg1d0FzRDl1WEZSWWYzc01OZ0VkazJkSmc1VGZmcWcrNlRQCjBla2VtWG9vSTYxTTc3VVFjWVdSVCtPWUtFd1V3dzZMcjJ3bGFKM1hBb0dBTzF3alVlU3RTeVllLy9XcFgrV2IKMFplUzIyZTVuSGxCTlRUVWJONjBzTmw1eWQyQ1VQdUJoOGF0VnBLMmI2V0F4aVZ3ZUplcWE3dFFhQzRnZ1ZaZQpzQ2JjZjRYUHJGblJnbVQvREVsS09IYTd1cWduYXgvYXkrNDR5cmNwM3dic0pCS01wdDF0L2xNY3BvZVgwTEppCk93b25JRllRaXVMUy9DNExUWmZvWnY4Q2dZQnIrMlhUajRYUE0zVlM4dlJwaStPdWZVNkZLWFRCUWU0OHNVYkUKUmFOMzM2RUVaTmNic1djaUw3dlRYQ1ZyRFJuWENYbmV3ZzhSYWJwQWpIYkVYK1VybklPUTNJSG0xZWt0NVhFWAprb0kvU2M3ODc4MmVySFRwY3ByZ1Y0WUJsbnRudlpjTkJCeEJQS2Fsbk5yNTcxdUFXVVNnWUdaZ2tjb1ZtOXZ3ClBMZHZId0tCZ0dYS2l5Y29zZzFuZHhkclQ0S05SSmdWZUd1M3ZqSjg4N0tQbThpbHB4alF3ekM2cjNRZDhYUWIKbGdWUnFBcG5mTnA1amM0WUZ5c2RvKzFhc2JrRTloczVUZk5sVUVtSWdvR3dxVnlmUkRiOEl0TklRQTBXZDZLdQpONy81UkZYRVlkUFR4YVhpNjl0cTZnRXp6cThTcnQyUUY5eEk5eG1EV0U5bGVEeDUwd1dZCi0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg== kind: Secret metadata: creationTimestamp: "2022-04-09T14:34:37Z" name: kube-etcd-client-certs namespace: kubesphere-monitoring-system resourceVersion: "856" uid: c74b122b-438d-4e40-8e1a-1b9445d4b3d5 type: Opaque
Seeing this means that secrets looks okay for the time being. At least the resource configuration exists. Let's continue to check later, and come back if it doesn't work.
In the process of writing the document, Jam reported that his environment does not have this secrets resource configuration. A secret can be generated by following the command above, taking care to check the actual path of the Etcd key.
According to the key point 2 obtained above, use the ip of each node of the external etcd to generate the endpoint .
Let's take a look at what the prometheus-endpointsEtcd.yaml file is.
prometheus-endpointsEtcd.yaml
apiVersion: v1 kind: Endpoints metadata: labels: k8s-app: etcd name: etcd namespace: kube-system subsets: - addresses: - ip: 127.0.0.1 ports: - name: metrics port: 2379 protocol: TCP
Let's see if there are Endpoints resources in our kubernetes.
[root@ks-k8s-master-0 ~]# kubectl get endpoints -n kubesphere-monitoring-system NAME ENDPOINTS AGE alertmanager-main 10.233.116.11:9093,10.233.117.10:9093,10.233.87.9:9093 9d alertmanager-operated 10.233.116.11:9094,10.233.117.10:9094,10.233.87.9:9094 + 6 more... 9d kube-state-metrics 10.233.87.8:8443,10.233.87.8:9443 9d node-exporter 192.168.9.91:9100,192.168.9.92:9100,192.168.9.93:9100 9d notification-manager-controller-metrics 10.233.116.8:8443 9d notification-manager-svc 10.233.116.13:19093,10.233.116.14:19093 9d notification-manager-webhook 10.233.116.8:9443 9d prometheus-k8s 10.233.117.43:9090,10.233.87.160:9090 9d prometheus-operated 10.233.117.43:9090,10.233.87.160:9090 9d prometheus-operator 10.233.116.7:8443 9d thanos-ruler-operated 10.233.117.18:10902,10.233.87.17:10902,10.233.117.18:10901 + 1 more... 8d
There is no Endpoints related to Etcd , need to create a new one?
When I was about to recreate it based on the configuration file, I suddenly discovered my own mistake, inertial thinking, and was led by the above command, and used the wrong command space. The command space of the configuration file instance is kube-system .
Query in kube-system again to find the resource configuration we want.
[root@ks-k8s-master-0 ~]# kubectl get endpoints -n kube-system NAME ENDPOINTS AGE coredns 10.233.117.2:53,10.233.117.3:53,10.233.117.2:53 + 3 more... 9d etcd 192.168.9.91:2379,192.168.9.92:2379,192.168.9.93:2379 3d20h kube-controller-manager-svc 192.168.9.91:10257,192.168.9.92:10257,192.168.9.93:10257 9d kube-scheduler-svc 192.168.9.91:10259,192.168.9.92:10259,192.168.9.93:10259 9d kubelet 192.168.9.91:10250,192.168.9.92:10250,192.168.9.93:10250 + 6 more... 9d openebs.io-local <none> 9d
Take a look at the configuration file content.
[root@ks-k8s-master-0 ~]# kubectl get endpoints etcd -n kube-system -o yaml apiVersion: v1 kind: Endpoints metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"Endpoints","metadata":{"annotations":{},"labels":{"k8s-app":"etcd"},"name":"etcd","namespace":"kube-system"},"subsets":[{"addresses":[{"ip":"192.168.9.91"},{"ip":"192.168.9.92"},{"ip":"192.168.9.93"}],"ports":[{"name":"metrics","port":2379,"protocol":"TCP"}]}]} creationTimestamp: "2022-04-15T08:24:18Z" labels: k8s-app: etcd name: etcd namespace: kube-system resourceVersion: "1559305" uid: c6d0ee2c-a228-4ea8-8ef1-73b387030950 subsets: - addresses: - ip: 192.168.9.91 - ip: 192.168.9.92 - ip: 192.168.9.93 ports: - name: metrics port: 2379 protocol: TCP
The configuration file looks correct, so let's continue to check.
According to the key point 3 obtained above, generate the etcd service using the above endpoint
Let's take a look at what the prometheus-serviceEtcd.yaml file is.
prometheus-serviceEtcd.yaml
apiVersion: v1 kind: Service metadata: labels: k8s-app: etcd name: etcd namespace: kube-system spec: clusterIP: None ports: - name: metrics port: 2379 targetPort: 2379 selector: null
Let's see if there are any Service resources in our kubernetes.
[root@ks-k8s-master-0 ~]# kubectl get service -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 9d etcd ClusterIP None <none> 2379/TCP 3d21h kube-controller-manager-svc ClusterIP None <none> 10257/TCP 9d kube-scheduler-svc ClusterIP None <none> 10259/TCP 9d kubelet ClusterIP None <none> 10250/TCP,10255/TCP,4194/TCP 9d
See the resource configuration details.
[root@ks-k8s-master-0 ~]# kubectl get service etcd -n kube-system -o yaml apiVersion: v1 kind: Service metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"k8s-app":"etcd"},"name":"etcd","namespace":"kube-system"},"spec":{"clusterIP":"None","ports":[{"name":"metrics","port":2379,"targetPort":2379}],"selector":null}} creationTimestamp: "2022-04-15T08:24:18Z" labels: k8s-app: etcd name: etcd namespace: kube-system resourceVersion: "1559307" uid: cfd92ee5-dbd1-4ee4-a4c4-d683ca7a41ea spec: clusterIP: None clusterIPs: - None ipFamilies: - IPv4 - IPv6 ipFamilyPolicy: RequireDualStack ports: - name: metrics port: 2379 protocol: TCP targetPort: 2379 sessionAffinity: None type: ClusterIP status: loadBalancer: {}
The configuration file looks correct, so let's continue to check.
According to the key point 4 obtained above, generate ServiceMonitor for grabbing Etcd data
Let's take a look at what the prometheus-serviceMonitorEtcd.yaml file is.
prometheus-serviceMonitorEtcd.yaml
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: etcd name: etcd namespace: kubesphere-monitoring-system spec: endpoints: - interval: 1m port: metrics scheme: https tlsConfig: caFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt certFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt keyFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key serverName: etcd.kube-system.svc.cluster.local jobLabel: k8s-app namespaceSelector: matchNames: - kube-system selector: matchLabels: k8s-app: etcd
Let's see if there is a ServiceMonitor resource in our Kubernetes.
[root@ks-k8s-master-0 ~]# kubectl get servicemonitor -n kubesphere-monitoring-system NAME AGE alertmanager 9d coredns 9d devops-jenkins 8d etcd 3d21h kube-apiserver 9d kube-controller-manager 9d kube-scheduler 9d kube-state-metrics 9d kubelet 9d node-exporter 9d prometheus 9d prometheus-operator 9d s2i-operator 8d
See the resource configuration details.
[root@ks-k8s-master-0 ~]# kubectl get servicemonitor etcd -n kubesphere-monitoring-system -o yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"labels":{"app.kubernetes.io/vendor":"kubesphere","k8s-app":"etcd"},"name":"etcd","namespace":"kubesphere-monitoring-system"},"spec":{"endpoints":[{"interval":"1m","port":"metrics","scheme":"https","tlsConfig":{"caFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt","certFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt","keyFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key"}}],"jobLabel":"k8s-app","namespaceSelector":{"matchNames":["kube-system"]},"selector":{"matchLabels":{"k8s-app":"etcd"}}}} creationTimestamp: "2022-04-15T08:24:18Z" generation: 1 labels: app.kubernetes.io/vendor: kubesphere k8s-app: etcd name: etcd namespace: kubesphere-monitoring-system resourceVersion: "1559308" uid: 386f16c0-74cd-4dbf-aa35-cc227062c881 spec: endpoints: - interval: 1m port: metrics scheme: https tlsConfig: caFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt certFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt keyFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key jobLabel: k8s-app namespaceSelector: matchNames: - kube-system selector: matchLabels: k8s-app: etcd
The configuration file looks correct, so let's continue to check.
- So far, I found that I have checked everything I can check, and there are all the necessary configurations. So why is there still a problem? There is no more detailed explanation in the reference document.
At this time, I found that I forgot a little, and I have not read the Pod's log, so I quickly went to see it.
In Cluster Management -> Application Load -> Workload -> Stateful Replica Set , select the kubesphere-monitoring-system project and find prometheus-k8s .
Click prometheus-k8s to enter the detailed page, and click the prometheus-k8s-0 container in the container group.
Click the button Container Log to pop up the Container Log page.
At this time, you will find a large number of error logs
Detailed error log.
level=error ts=2022-04-19T06:49:08.169Z caller=manager.go:188 component="scrape manager" msg="error creating nescrape pool" err="error creating HTTP client: unable to load specified CA cert /etc/prometheus/secrets/kube-etcclient-certs/etcd-client-ca.crt: open /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt: no sucfile or directory" scrape_pool=kubesphere-monitoring-system/etcd/0
Seeing this we found the cause of the problem, the file /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt could not be found.
Open the terminal of the Pod and enter the system to verify.
As it turns out, the entire folder doesn't exist.
Take another look at the Pod configuration to see if there is a secrets configuration.
Seeing this, it's real , I think I found the root of the problem, and I also thought of a solution to the problem, that is, the secrets of kube-etcd-client-certs are not mounted in the Pod, then we try to mount it. Problem solved? ? ?
In the console, find our stateful replica set prometheus-k8s , click More Actions -> Edit Settings .
In the storage volume , mount the configuration dictionary or the privacy dictionary .
Select the secret dictionary , mount kube-etcd-client-certs to /etc/prometheus/secrets/kube-etcd-client-certs read-only , and finalize.
After clicking OK, you will find that the Pod starts to rebuild, I thought it was ok and waited to see the effect and it was over, the result. . .
After the Pod was successfully rebuilt, I thought everything was under my control, and there was definitely no problem. As a result, I found that the modified configuration has changed back to the original one. The secrets we want are not mounted in the Pod at all, and the configuration is the same as before.
After repeating the operation three times, I collapsed, and I realized that the method I changed was wrong. This is done by prometheus-operator , and the configuration will not take effect if I modify it alone.
- prometheus-operator , I haven't played this thing before, I don't know the technical details, what should I do. . . Continue to Baidu.
Baidu.
Keyword prometheus operator etcd .
The first one took a look, it didn't help, I didn't show it, you can watch it yourself if you are interested
After 2 minutes, I opened the second-ranked article. The idea of the article was relatively clear. I quickly scrolled down and found the method I wanted when I saw the third point.I don't know the details, but our purpose is to mount secrets. Since it is mentioned here, let's try it.
[root@ks-k8s-master-0 ~]# kubectl edit prometheuses -n kubesphere-monitoring-system
# Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | ....
The content of the file is similar to the above, we search for secret, and an error E486: Pattern not found: secret appears.
It means that there is no secret configuration in the default configuration, we add it ourselves, and add it around line 78 of the file.
secrets: - kube-etcd-client-certs
The final effect is similar (I added line numbers for clarity):
71 securityContext: 72 fsGroup: 0 73 runAsNonRoot: false 74 runAsUser: 0 75 serviceAccountName: prometheus-k8s 76 serviceMonitorNamespaceSelector: {} 77 serviceMonitorSelector: {} 78 secrets: 79 - kube-etcd-client-certs 80 storage: 81 volumeClaimTemplate: 82 spec: 83 resources: 84 requests: 85 storage: 20Gi 86 tolerations: 87 - effect: NoSchedule 88 key: dedicated 89 operator: Equal 90 value: monitoring 91 version: v2.26.0
Save and exit.
When we check the configuration of the stateful replica set again, we will find that there is an additional configuration of the secret dictionary.
Looking at the specific configuration of the Pod, you will find that the configuration of the Pod also includes the configuration of the secret dictionary.
Looking at the days of the Pod again, I found that there was no error .
It feels that the problem has been solved, so let's see if the monitoring has graphics ( there is still a little expectation ).
The moment when the final answer is revealed.
Take a panorama first.
A few more local high-definition pictures (supplemented later, I didn't capture it at the beginning).
- At this point, the problem has been initially solved, but there are still many details that we need to learn in depth to understand the deeper underlying knowledge.
4. The technical key points of Prometheus-Operator monitoring Etcd
Technical key points
How to install Etcd
The Etcd installed by KubeSphere is binary, and the verification method is as follows.
## 看进程确认是二进制方式 [root@ks-k8s-master-0 ~]# ps -ef | grep etcd root 1158 56409 0 15:43 pts/0 00:00:00 grep --color=auto etcd root 15301 1 6 Apr09 ? 15:35:08 /usr/local/bin/etcd root 17247 17219 13 Apr09 ? 1-06:55:24 kube-apiserver --advertise-address=192.168.9.91 --allow-privileged=true --audit-log-maxage=30 --audit-log-maxbackup=10 --audit-log-maxsize=100 --authorization-mode=Node,RBAC --bind-address=0.0.0.0 --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/ssl/etcd/ssl/ca.pem --etcd-certfile=/etc/ssl/etcd/ssl/node-ks-k8s-master-0.pem --etcd-keyfile=/etc/ssl/etcd/ssl/node-ks-k8s-master-0-key.pem --etcd-servers=https://192.168.9.91:2379,https://192.168.9.92:2379,https://192.168.9.93:2379 --feature-gates=CSIStorageCapacity=true,RotateKubeletServerCertificate=true,TTLAfterFinished=true,ExpandCSIVolumes=true --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.233.0.0/18 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key ## 看 ssl 密钥文件有哪些 [root@ks-k8s-master-0 ~]# ll /etc/ssl/etcd/ssl/ total 80 -rw------- 1 root root 1675 Apr 9 22:32 admin-ks-k8s-master-0-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 admin-ks-k8s-master-0.pem -rw------- 1 root root 1679 Apr 9 22:32 admin-ks-k8s-master-1-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 admin-ks-k8s-master-1.pem -rw------- 1 root root 1679 Apr 9 22:32 admin-ks-k8s-master-2-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 admin-ks-k8s-master-2.pem -rw------- 1 root root 1675 Apr 9 22:32 ca-key.pem -rw-r--r-- 1 root root 1086 Apr 9 22:32 ca.pem -rw------- 1 root root 1679 Apr 9 22:32 member-ks-k8s-master-0-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 member-ks-k8s-master-0.pem -rw------- 1 root root 1675 Apr 9 22:32 member-ks-k8s-master-1-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 member-ks-k8s-master-1.pem -rw------- 1 root root 1675 Apr 9 22:32 member-ks-k8s-master-2-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 member-ks-k8s-master-2.pem -rw------- 1 root root 1675 Apr 9 22:32 node-ks-k8s-master-0-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 node-ks-k8s-master-0.pem -rw------- 1 root root 1679 Apr 9 22:32 node-ks-k8s-master-1-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 node-ks-k8s-master-1.pem -rw------- 1 root root 1679 Apr 9 22:32 node-ks-k8s-master-2-key.pem -rw-r--r-- 1 root root 1440 Apr 9 22:32 node-ks-k8s-master-2.pem
- Prometheus-Operator monitors the configuration of Etcd
- Generate secret with certificate from external Etcd
- Generate endpoint with the ip of each node of external Etcd
- Generate etcd service using Endpoint
- Generate ServiceMonitor for scraping Etcd data
Places that need in-depth study (occupancy, to be added)
- The implementation principle and technical details of Prometheus-Operator.
- The configuration process of KubeSphere for Prometheus-Operator.
5. Summary
According to the actual needs of operation and maintenance, this article introduces the correct posture to enable Etcd monitoring, and also introduces the troubleshooting process to solve this problem in detail. For those who need to enable the Etcd monitoring function of KubeSphere 3.2.1, you can refer to this article for configuration.
Reference documentation
- etcd uses a self-signed certificate, prometheus reports an error issued by an unknown authority #2.11
- https://www.cnblogs.com/lvcisco/p/12575608.html?ivk_sa=1024320u
Get Document
Get code
Station B
This article is published by OpenWrite , a multi-post blog platform!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。