Monitoring Etcd outside the cluster in KubeSphere

Author: Zhang Yanying (Old Z), operation and maintenance architect of Shandong Branch of Telecom System Integration Company, cloud native enthusiast, currently focusing on cloud native operation and maintenance.

1. Introduction to this article

This article originates from the Ectd monitoring mentioned by a small partner @Jam in the KubeSphere open source community 8 group. I hope I can help you to take a look. Originally, I didn't enable Etcd monitoring, but since my friends trusted me so much and made a request, it must be arranged. Hence this article.

After research, it is found that there is an Etcd monitoring page displayed in the cluster status monitoring that comes with KubeSphere. However, in KubeSphere 3.2.1 version, after the default configuration enables Etcd monitoring, the Etcd monitoring page in the cluster status does not have any data. This article will document the troubleshooting journey to resolve this issue.

Knowledge points of this article

Rating: entry level
Prometheus-Operator
KubeSphere enables Etcd monitoring

Demo server configuration

CPU name	IP	CPU	RAM	system disk	data disk	use
zdeops-master	192.168.9.9	2	4	40	200	Ansible operation and maintenance control node
ks-k8s-master-0	192.168.9.91	8	32	40	200	KubeSphere/k8s-master/k8s-worker
ks-k8s-master-1	192.168.9.92	8	32	40	200	KubeSphere/k8s-master/k8s-worker
ks-k8s-master-2	192.168.9.93	8	32	40	200	KubeSphere/k8s-master/k8s-worker
glusterfs-node-0	192.168.9.95	4	8	40	200	GlusterFS/ElasticSearch
glusterfs-node-1	192.168.9.96	4	8	40	200	GlusterFS/ElasticSearch
glusterfs-node-2	192.168.9.97	4	8	40	200	GlusterFS/ElasticSearch

2. KubeSphere CRD enables Etcd monitoring

Edit the YAML configuration file for ks-installer in the CRD .
In the YAML file, search for etcd and change monitoring from false to true.
```
 etcd:
  endpointIps: '192.168.9.91,192.168.9.92,192.168.9.93'
  monitoring: true
  port: 2379
  tlsEnable: true
```
After all configurations are complete, click OK in the lower right corner to save the configuration.

Execute the following command in kubectl to check the installation process.

 kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f

Results are not displayed.

Verify the installation results.
Log in to the console, Platform Management- > Cluster Management- > Monitoring Alarms- > Cluster Status , and check whether the etcd monitoring tab exists. If it exists, the monitoring is successfully enabled.
Although the previous configuration is enabled, the monitoring data does not exist at this time. At the same time, checking the Pod of prometheus-k8s will find the following error.
Next, we will explain why and how to configure it.

3. Problem solving process record

Find the official forum, the keyword uses etcd to find the following document that looks relatively close, open it and take a look.
etcd uses a self-signed certificate, prometheus reports an error issued by an unknown authority #2.11
However, there is no detailed problem solving process in the document. I am confused, but I have obtained very important configuration steps.

According to the key point 1 obtained above, generate the secret with the certificate of the external etcd .

This command is to generate a secret configuration based on etcd's cert.

 # kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs --from-file=etcd-client-ca.crt=/etc/ssl/etcd/ssl/ca.pem --from-file=etcd-client.crt=/etc/ssl/etcd/ssl/admin-i-ezjb7gsk.pem --from-file=etcd-client.key=/etc/ssl/etcd/ssl/admin-i-ezjb7gsk-key.pem

Don't worry, first see if the secret exists, and if not, generate it according to the command.

 [root@ks-k8s-master-0 ~]# kubectl get secrets -n kubesphere-monitoring-system
NAME                                         TYPE                                  DATA   AGE
additional-scrape-configs                    Opaque                                1      9d
alertmanager-main                            Opaque                                1      9d
alertmanager-main-generated                  Opaque                                1      9d
alertmanager-main-tls-assets                 Opaque                                0      9d
alertmanager-main-token-7b9xc                kubernetes.io/service-account-token   3      9d
default-token-tnxh7                          kubernetes.io/service-account-token   3      9d
kube-etcd-client-certs                       Opaque                                3      9d
kube-state-metrics-token-czbrg               kubernetes.io/service-account-token   3      9d
node-exporter-token-qrhl7                    kubernetes.io/service-account-token   3      9d
notification-manager-sa-token-lc6z4          kubernetes.io/service-account-token   3      9d
notification-manager-webhook-server-cert     kubernetes.io/tls                     2      9d
prometheus-k8s                               Opaque                                1      9d
prometheus-k8s-tls-assets                    Opaque                                0      9d
prometheus-k8s-token-7fk45                   kubernetes.io/service-account-token   3      9d
prometheus-operator-token-wlmcf              kubernetes.io/service-account-token   3      9d
sh.helm.release.v1.notification-manager.v1   helm.sh/release.v1                    1      9d

Actually found kube-etcd-client-certs .

Looking at the specific content, I found that there are all of them, and there are not many.

 [root@ks-k8s-master-0 ~]# kubectl get secrets -n kubesphere-monitoring-system kube-etcd-client-certs -o yaml
apiVersion: v1
data:
  etcd-client-ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM5VENDQWQyZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFTTVJBd0RnWURWUVFERXdkbGRHTmsKTFdOaE1CNFhEVEl5TURRd09URTBNekl5TjFvWERUTXlNRFF3TmpFME16SXlOMW93RWpFUU1BNEdBMVVFQXhNSApaWFJqWkMxallUQ0NBU0l3RFFZSktvWklodmNOQVFFQkJRQURnZ0VQQURDQ0FRb0NnZ0VCQU53SnpobDFPSVpyCkZYOUNsbER3czVVdnA5NkxHOHpxWkZGbmRGZVBlb1RrTXlFSVpESFRQM0lYSFhzaFFPNjF3VlpVd3VvMmJoeTcKdTBLbEFUcXZmZ1ZJTWE2MlpKTFVNcGwrendvMnFDcWpzbHd1b3RacHArTHVYaldYRTFOeWcwWi9MRmd3NDArOQpGSDV3Y2VWK0FhNjhETElKQWw4a0l6VktScVgraENjZGVTOFRWbDNVeS9PMWRkRFJGODExYzB6VTNteEF2Z0h5CmlxOFF0S2dBQ3E0L294N3RPRFRZUVNlVVdOa25tZTBLMituWmR6M1RveHpUamdIZ2FDVlFXVW5nNFNyMVlSYWwKV2owTGlET2tWb2l3TlFrSVd6ZnBrVXUrM2RJUGNPL29Wc0E3eEJLenhGdEp2dmthTGU1ZDd6a3p2d2xVdE1NYgp2NzNzNERqNU0yc0NBd0VBQWFOV01GUXdEZ1lEVlIwUEFRSC9CQVFEQWdLa01BOEdBMVVkRXdFQi93UUZNQU1CCkFmOHdIUVlEVlIwT0JCWUVGREh3WUNYcW90OG9oYWNZa1FBaHMrRjNSWW5tTUJJR0ExVWRFUVFMTUFtQ0IyVjAKWTJRdFkyRXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBS3l3SEJpVEkxYjExQjNrTDJNZFN0WGRaZ2ZNT05obApuZ1QyUjVuQWZISUVTZVRGNnpFbWh6QnBRb3ozMm1GbG1VdlRKMjdhdVk4UGh2cC9pT0pKbWZIZnY3RWcyYVpJCmlkK2w5YTJoQXFrMnVnNmV4NFpjUzgvOUxyTUV3SlhDOGZqeTA0OWdLQjIyMXFuSFh0Q3VyNE95MUFyMHBiUUwKaEQ4T0lpaExBbHpZNnIvQTlzVDYrNU12cy80OE5LeWN0Sy9KYzFhbVVQK0tnWXlPWDNWNXVsM096MFpIT2ptRAo5akIrdlNHUHM5REdrdnJEeFp4SDRIM0NhaTF5cHBlc29YVFZndS81UTFjcVlvdGNJalZpekx5eVNjZ1EzQ2ZqCmVvdnk3NW8vZUdiRmpYSmJQV0NncDhYV2RJWkVmcmNXMXZtWjZPZDVmcXIwblY5QVExekhueWs9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  etcd-client.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUQrVENDQXVHZ0F3SUJBZ0lJT2Y3Ky90T3NYa013RFFZSktvWklodmNOQVFFTEJRQXdFakVRTUE0R0ExVUUKQXhNSFpYUmpaQzFqWVRBZUZ3MHlNakEwTURreE5ETXlNamRhRncwek1qQTBNRFl4TkRNeU16QmFNQ1F4SWpBZwpCZ05WQkFNVEdXVjBZMlF0Ym05a1pTMXJjeTFyT0hNdGJXRnpkR1Z5TFRBd2dnRWlNQTBHQ1NxR1NJYjNEUUVCCkFRVUFBNElCRHdBd2dnRUtBb0lCQVFDN0NvS1dWKzJKeXRVRTc2VnhvU3lOZzZXOU4yRUlxaTA5UkQ3TThTYUMKZzNHSFZJcXRjWUZzWEhNSHNGeGkyc0ltRWdTblRQMU1sS2Y2Q2xoZ1llSUJqbHJjdWVGNzNDUW45dkw3bXdqMwpJVzV0cUJ4Z1BwRmpvc1FQcGs5eU5XWmpEVGJsbHJTbkZjTXNKekFEOXNIZjdiRWUrQTZJcnJDUnhLZGJWaVY1CnFveFR5THhJenF4c2NDMlMwclJCYk5YbHAzZFU1QStldGZhOUYxUFNCeDQxdmk1MXcvTnBVRkNOa2ZuaWhyZnUKcUVoYW0zNUdCbFYrRzd4ZENSVGt6K3h3V3IwdnhMUitueGZ5MElHL2hyYlIxL0RLbHo5Y3BnbHhTWUg5S3ZvbgpzVXRpemhQYXVsRFZIN2NFdTJGOWZuTHZlK2hZemt3c3hhS1RsQTFlQ2VEeEFnTUJBQUdqZ2dFL01JSUJPekFPCkJnTlZIUThCQWY4RUJBTUNCYUF3SFFZRFZSMGxCQll3RkFZSUt3WUJCUVVIQXdFR0NDc0dBUVVGQndNQ01Bd0cKQTFVZEV3RUIvd1FDTUFBd0h3WURWUjBqQkJnd0ZvQVVNZkJnSmVxaTN5aUZweGlSQUNHejRYZEZpZVl3Z2RvRwpBMVVkRVFTQjBqQ0J6NElFWlhSalpJSVFaWFJqWkM1cmRXSmxMWE41YzNSbGJZSVVaWFJqWkM1cmRXSmxMWE41CmMzUmxiUzV6ZG1PQ0ltVjBZMlF1YTNWaVpTMXplWE4wWlcwdWMzWmpMbU5zZFhOMFpYSXViRzlqWVd5Q0QydHoKTFdzNGN5MXRZWE4wWlhJdE1JSVBhM010YXpoekxXMWhjM1JsY2kweGdnOXJjeTFyT0hNdGJXRnpkR1Z5TFRLQwpFMnhpTG10MVltVnpjR2hsY21VdWJHOWpZV3lDQ1d4dlkyRnNhRzl6ZEljRWZ3QUFBWWNRQUFBQUFBQUFBQUFBCkFBQUFBQUFBQVljRXdLZ0pXNGNFd0tnSlhJY0V3S2dKWFRBTkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQXZOR2gKdHdlTG1QS2F2YjVhOFoxU2sxQkFZdzZ6dEdHTnJGdzg2M1dKRVBEblFFa3duOFhJNGh4SU82UVV3eHJic1MweAp0YUg2ZmRKeFZZcEN5UXVrV3JldHpkZ05zMTVWYnlNdUlqVkJRMytGZnBRaDB5T25tUXlmRWc2UWZNdU5IWGpJCjZCdVp5M0p0S0tFZGZmUFh4U3VlMFV2TG5idlN6U0tVQkRIcy9nNVV0Q3cyeHVIVFU5bFdoQXY2dm1WQ08yQW4KZmc2MjAzMUpUNG9ya2F6c1hmdENOTlZqUmdIZ2pjQ0NDZkMwY1hSRVZTVFZqZUFaZU40ZUdtYWlRcFdEUWkxbApUVWZJMlE0dGRySlFsOXk0dDNKRDgrSmFLT0VJWkt3NWVWaTc3cUZobWR1MmFkRThkODc0aVBnN2ZEYmVFS2tWCkYxVWVKb3NKOFN3Z1psWTRpQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  etcd-client.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb3dJQkFBS0NBUUVBdXdxQ2xsZnRpY3JWQk8rbGNhRXNqWU9sdlRkaENLb3RQVVErelBFbWdvTnhoMVNLCnJYR0JiRnh6QjdCY1l0ckNKaElFcDB6OVRKU24rZ3BZWUdIaUFZNWEzTG5oZTl3a0ovYnkrNXNJOXlGdWJhZ2MKWUQ2Ulk2TEVENlpQY2pWbVl3MDI1WmEwcHhYRExDY3dBL2JCMysyeEh2Z09pSzZ3a2NTblcxWWxlYXFNVThpOApTTTZzYkhBdGt0SzBRV3pWNWFkM1ZPUVBuclgydlJkVDBnY2VOYjR1ZGNQemFWQlFqWkg1NG9hMzdxaElXcHQrClJnWlZmaHU4WFFrVTVNL3NjRnE5TDhTMGZwOFg4dENCdjRhMjBkZnd5cGMvWEtZSmNVbUIvU3I2SjdGTFlzNFQKMnJwUTFSKzNCTHRoZlg1eTczdm9XTTVNTE1XaWs1UU5YZ25nOFFJREFRQUJBb0lCQVFDamQ1c0x4SXNRMjFsegpOL0xUTFhhZnM0ZmRxQkhCSGVIdDRzQTBJeXB4OUdqN1NwTHM1UCtrOGVPQ3U4cnlocGdaNTdOemVDRUVsZ044Cnp4L1FGSndPbWhpbFFqdGtJZERqc0x0SjFJUndZQ0ovNmVYcTQ2UHpmV1IyL1BZQUxkVnZDalNKVVQ1UHJRQm4KalZRMGtxdDhodU0rMnJMeEdDT3ZNanpGNGJOYzhZZGFSOTI0c095Y1Q2UzI1Vzg3TklQWnVqY3VBUXIzaEE2bwpUbEdmVU44Q0hSM21jVnBIbEJ1NDhEeEpYaml2MkVKZTRHSmN2L0NWQTVqVGNNNlNoTjJuSGN3OGpHYVg0bGJtCjJYaktKemE0RStON3hGRXBRVEJRMUNqRGM1cndKY0tKUm9IQkxFUGtJVE5LWnNWSDlmK0tuNmpjQWtmOTZoWVkKKzY1TTMza1ZBb0dCQU5GMVdRNG4wcTE0YlpSY1FkbnFoWDdYT0pFbDBtOUZuYVhOTjNsb0M1SnNneGxkbXh5bgpRV1IvZkJVQnRaTUc5MmgzdTBheWUyaWdZdGtSc1pDV0wwL2VicmJGMWlmYXozR2Z1b3lSZWozMHVsRDJYY3phCmQzSEUwdVpTSVQrUkFSTTF1VjJUczVUSHJqUStIT3Z5cEpFQjFlSnY1L21LWmRpUTRtMzBGMDUzQW9HQkFPU1oKL21NWXd4V1Y4SFRtaENyNGsycDJQd1NLTHVrajhZaVJQZHhVSFpXWXdRTGFFRU1uSVVnUFJBSnFHc1VtWng5TApacDVjYXp3bW9ldDI0cXpGeVhkemFUMi96VGc1Rjg1d0FzRDl1WEZSWWYzc01OZ0VkazJkSmc1VGZmcWcrNlRQCjBla2VtWG9vSTYxTTc3VVFjWVdSVCtPWUtFd1V3dzZMcjJ3bGFKM1hBb0dBTzF3alVlU3RTeVllLy9XcFgrV2IKMFplUzIyZTVuSGxCTlRUVWJONjBzTmw1eWQyQ1VQdUJoOGF0VnBLMmI2V0F4aVZ3ZUplcWE3dFFhQzRnZ1ZaZQpzQ2JjZjRYUHJGblJnbVQvREVsS09IYTd1cWduYXgvYXkrNDR5cmNwM3dic0pCS01wdDF0L2xNY3BvZVgwTEppCk93b25JRllRaXVMUy9DNExUWmZvWnY4Q2dZQnIrMlhUajRYUE0zVlM4dlJwaStPdWZVNkZLWFRCUWU0OHNVYkUKUmFOMzM2RUVaTmNic1djaUw3dlRYQ1ZyRFJuWENYbmV3ZzhSYWJwQWpIYkVYK1VybklPUTNJSG0xZWt0NVhFWAprb0kvU2M3ODc4MmVySFRwY3ByZ1Y0WUJsbnRudlpjTkJCeEJQS2Fsbk5yNTcxdUFXVVNnWUdaZ2tjb1ZtOXZ3ClBMZHZId0tCZ0dYS2l5Y29zZzFuZHhkclQ0S05SSmdWZUd1M3ZqSjg4N0tQbThpbHB4alF3ekM2cjNRZDhYUWIKbGdWUnFBcG5mTnA1amM0WUZ5c2RvKzFhc2JrRTloczVUZk5sVUVtSWdvR3dxVnlmUkRiOEl0TklRQTBXZDZLdQpONy81UkZYRVlkUFR4YVhpNjl0cTZnRXp6cThTcnQyUUY5eEk5eG1EV0U5bGVEeDUwd1dZCi0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg==
kind: Secret
metadata:
  creationTimestamp: "2022-04-09T14:34:37Z"
  name: kube-etcd-client-certs
  namespace: kubesphere-monitoring-system
  resourceVersion: "856"
  uid: c74b122b-438d-4e40-8e1a-1b9445d4b3d5
type: Opaque

Seeing this means that secrets looks okay for the time being. At least the resource configuration exists. Let's continue to check later, and come back if it doesn't work.

In the process of writing the document, Jam reported that his environment does not have this secrets resource configuration. A secret can be generated by following the command above, taking care to check the actual path of the Etcd key.

According to the key point 2 obtained above, use the ip of each node of the external etcd to generate the endpoint .

Let's take a look at what the prometheus-endpointsEtcd.yaml file is.

prometheus-endpointsEtcd.yaml

 apiVersion: v1
kind: Endpoints
metadata:
  labels:
    k8s-app: etcd
  name: etcd
  namespace: kube-system
subsets:
- addresses:
  - ip: 127.0.0.1
  ports:
  - name: metrics
    port: 2379
    protocol: TCP

Let's see if there are Endpoints resources in our kubernetes.

 [root@ks-k8s-master-0 ~]# kubectl get endpoints -n kubesphere-monitoring-system
NAME                                      ENDPOINTS                                                                AGE
alertmanager-main                         10.233.116.11:9093,10.233.117.10:9093,10.233.87.9:9093                   9d
alertmanager-operated                     10.233.116.11:9094,10.233.117.10:9094,10.233.87.9:9094 + 6 more...       9d
kube-state-metrics                        10.233.87.8:8443,10.233.87.8:9443                                        9d
node-exporter                             192.168.9.91:9100,192.168.9.92:9100,192.168.9.93:9100                    9d
notification-manager-controller-metrics   10.233.116.8:8443                                                        9d
notification-manager-svc                  10.233.116.13:19093,10.233.116.14:19093                                  9d
notification-manager-webhook              10.233.116.8:9443                                                        9d
prometheus-k8s                            10.233.117.43:9090,10.233.87.160:9090                                    9d
prometheus-operated                       10.233.117.43:9090,10.233.87.160:9090                                    9d
prometheus-operator                       10.233.116.7:8443                                                        9d
thanos-ruler-operated                     10.233.117.18:10902,10.233.87.17:10902,10.233.117.18:10901 + 1 more...   8d

There is no Endpoints related to Etcd , need to create a new one?

When I was about to recreate it based on the configuration file, I suddenly discovered my own mistake, inertial thinking, and was led by the above command, and used the wrong command space. The command space of the configuration file instance is kube-system .

Query in kube-system again to find the resource configuration we want.

 [root@ks-k8s-master-0 ~]# kubectl get endpoints -n kube-system
NAME                          ENDPOINTS                                                              AGE
coredns                       10.233.117.2:53,10.233.117.3:53,10.233.117.2:53 + 3 more...            9d
etcd                          192.168.9.91:2379,192.168.9.92:2379,192.168.9.93:2379                  3d20h
kube-controller-manager-svc   192.168.9.91:10257,192.168.9.92:10257,192.168.9.93:10257               9d
kube-scheduler-svc            192.168.9.91:10259,192.168.9.92:10259,192.168.9.93:10259               9d
kubelet                       192.168.9.91:10250,192.168.9.92:10250,192.168.9.93:10250 + 6 more...   9d
openebs.io-local              <none>                                                                 9d

Take a look at the configuration file content.

 [root@ks-k8s-master-0 ~]# kubectl get endpoints etcd -n kube-system -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Endpoints","metadata":{"annotations":{},"labels":{"k8s-app":"etcd"},"name":"etcd","namespace":"kube-system"},"subsets":[{"addresses":[{"ip":"192.168.9.91"},{"ip":"192.168.9.92"},{"ip":"192.168.9.93"}],"ports":[{"name":"metrics","port":2379,"protocol":"TCP"}]}]}
  creationTimestamp: "2022-04-15T08:24:18Z"
  labels:
    k8s-app: etcd
  name: etcd
  namespace: kube-system
  resourceVersion: "1559305"
  uid: c6d0ee2c-a228-4ea8-8ef1-73b387030950
subsets:
- addresses:
  - ip: 192.168.9.91
  - ip: 192.168.9.92
  - ip: 192.168.9.93
  ports:
  - name: metrics
    port: 2379
    protocol: TCP

The configuration file looks correct, so let's continue to check.

According to the key point 3 obtained above, generate the etcd service using the above endpoint

Let's take a look at what the prometheus-serviceEtcd.yaml file is.

prometheus-serviceEtcd.yaml

 apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: etcd
  name: etcd
  namespace: kube-system
spec:
  clusterIP: None
  ports:
  - name: metrics
    port: 2379
    targetPort: 2379
  selector: null

Let's see if there are any Service resources in our kubernetes.

 [root@ks-k8s-master-0 ~]# kubectl get service -n kube-system
NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                        AGE
coredns                       ClusterIP   10.233.0.3   <none>        53/UDP,53/TCP,9153/TCP         9d
etcd                          ClusterIP   None         <none>        2379/TCP                       3d21h
kube-controller-manager-svc   ClusterIP   None         <none>        10257/TCP                      9d
kube-scheduler-svc            ClusterIP   None         <none>        10259/TCP                      9d
kubelet                       ClusterIP   None         <none>        10250/TCP,10255/TCP,4194/TCP   9d

See the resource configuration details.

 [root@ks-k8s-master-0 ~]# kubectl get service etcd -n kube-system -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"k8s-app":"etcd"},"name":"etcd","namespace":"kube-system"},"spec":{"clusterIP":"None","ports":[{"name":"metrics","port":2379,"targetPort":2379}],"selector":null}}
  creationTimestamp: "2022-04-15T08:24:18Z"
  labels:
    k8s-app: etcd
  name: etcd
  namespace: kube-system
  resourceVersion: "1559307"
  uid: cfd92ee5-dbd1-4ee4-a4c4-d683ca7a41ea
spec:
  clusterIP: None
  clusterIPs:
  - None
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack
  ports:
  - name: metrics
    port: 2379
    protocol: TCP
    targetPort: 2379
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

The configuration file looks correct, so let's continue to check.

According to the key point 4 obtained above, generate ServiceMonitor for grabbing Etcd data

Let's take a look at what the prometheus-serviceMonitorEtcd.yaml file is.

prometheus-serviceMonitorEtcd.yaml

 apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: etcd
  name: etcd
  namespace: kubesphere-monitoring-system
spec:
  endpoints:
  - interval: 1m
    port: metrics
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt
      certFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt
      keyFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key
      serverName: etcd.kube-system.svc.cluster.local
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: etcd

Let's see if there is a ServiceMonitor resource in our Kubernetes.

 [root@ks-k8s-master-0 ~]# kubectl get servicemonitor -n kubesphere-monitoring-system
NAME                      AGE
alertmanager              9d
coredns                   9d
devops-jenkins            8d
etcd                      3d21h
kube-apiserver            9d
kube-controller-manager   9d
kube-scheduler            9d
kube-state-metrics        9d
kubelet                   9d
node-exporter             9d
prometheus                9d
prometheus-operator       9d
s2i-operator              8d

See the resource configuration details.

 [root@ks-k8s-master-0 ~]# kubectl get servicemonitor etcd -n kubesphere-monitoring-system -o yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"labels":{"app.kubernetes.io/vendor":"kubesphere","k8s-app":"etcd"},"name":"etcd","namespace":"kubesphere-monitoring-system"},"spec":{"endpoints":[{"interval":"1m","port":"metrics","scheme":"https","tlsConfig":{"caFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt","certFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt","keyFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key"}}],"jobLabel":"k8s-app","namespaceSelector":{"matchNames":["kube-system"]},"selector":{"matchLabels":{"k8s-app":"etcd"}}}}
  creationTimestamp: "2022-04-15T08:24:18Z"
  generation: 1
  labels:
    app.kubernetes.io/vendor: kubesphere
    k8s-app: etcd
  name: etcd
  namespace: kubesphere-monitoring-system
  resourceVersion: "1559308"
  uid: 386f16c0-74cd-4dbf-aa35-cc227062c881
spec:
  endpoints:
  - interval: 1m
    port: metrics
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt
      certFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt
      keyFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: etcd

The configuration file looks correct, so let's continue to check.

So far, I found that I have checked everything I can check, and there are all the necessary configurations. So why is there still a problem? There is no more detailed explanation in the reference document.
At this time, I found that I forgot a little, and I have not read the Pod's log, so I quickly went to see it.
In Cluster Management -> Application Load -> Workload -> Stateful Replica Set , select the kubesphere-monitoring-system project and find prometheus-k8s .
Click prometheus-k8s to enter the detailed page, and click the prometheus-k8s-0 container in the container group.
Click the button Container Log to pop up the Container Log page.
At this time, you will find a large number of error logs
Detailed error log.
```
 level=error ts=2022-04-19T06:49:08.169Z caller=manager.go:188 component="scrape manager" msg="error creating nescrape pool" err="error creating HTTP client: unable to load specified CA cert /etc/prometheus/secrets/kube-etcclient-certs/etcd-client-ca.crt: open /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt: no sucfile or directory" scrape_pool=kubesphere-monitoring-system/etcd/0
```
Seeing this we found the cause of the problem, the file /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt could not be found.
Open the terminal of the Pod and enter the system to verify.
As it turns out, the entire folder doesn't exist.
Take another look at the Pod configuration to see if there is a secrets configuration.
Seeing this, it's real , I think I found the root of the problem, and I also thought of a solution to the problem, that is, the secrets of kube-etcd-client-certs are not mounted in the Pod, then we try to mount it. Problem solved? ? ?
In the console, find our stateful replica set prometheus-k8s , click More Actions -> Edit Settings .
In the storage volume , mount the configuration dictionary or the privacy dictionary .
Select the secret dictionary , mount kube-etcd-client-certs to /etc/prometheus/secrets/kube-etcd-client-certs read-only , and finalize.
After clicking OK, you will find that the Pod starts to rebuild, I thought it was ok and waited to see the effect and it was over, the result. . .
After the Pod was successfully rebuilt, I thought everything was under my control, and there was definitely no problem. As a result, I found that the modified configuration has changed back to the original one. The secrets we want are not mounted in the Pod at all, and the configuration is the same as before.
After repeating the operation three times, I collapsed, and I realized that the method I changed was wrong. This is done by prometheus-operator , and the configuration will not take effect if I modify it alone.
prometheus-operator , I haven't played this thing before, I don't know the technical details, what should I do. . . Continue to Baidu.
Baidu.
Keyword prometheus operator etcd .
The first one took a look, it didn't help, I didn't show it, you can watch it yourself if you are interested
After 2 minutes, I opened the second-ranked article. The idea of the article was relatively clear. I quickly scrolled down and found the method I wanted when I saw the third point.

I don't know the details, but our purpose is to mount secrets. Since it is mentioned here, let's try it.

 [root@ks-k8s-master-0 ~]# kubectl edit prometheuses -n kubesphere-monitoring-system

 # Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
....

The content of the file is similar to the above, we search for secret, and an error E486: Pattern not found: secret appears.

It means that there is no secret configuration in the default configuration, we add it ourselves, and add it around line 78 of the file.

 secrets:
- kube-etcd-client-certs

The final effect is similar (I added line numbers for clarity):

 71   securityContext:
72     fsGroup: 0
73     runAsNonRoot: false
74     runAsUser: 0
75   serviceAccountName: prometheus-k8s
76   serviceMonitorNamespaceSelector: {}
77   serviceMonitorSelector: {}
78   secrets:
79   - kube-etcd-client-certs
80   storage:
81     volumeClaimTemplate:
82       spec:
83         resources:
84           requests:
85             storage: 20Gi
86   tolerations:
87   - effect: NoSchedule
88     key: dedicated
89     operator: Equal
90     value: monitoring
91   version: v2.26.0

Save and exit.

When we check the configuration of the stateful replica set again, we will find that there is an additional configuration of the secret dictionary.

Looking at the specific configuration of the Pod, you will find that the configuration of the Pod also includes the configuration of the secret dictionary.

Looking at the days of the Pod again, I found that there was no error .

It feels that the problem has been solved, so let's see if the monitoring has graphics ( there is still a little expectation ).

The moment when the final answer is revealed.
Take a panorama first.
A few more local high-definition pictures (supplemented later, I didn't capture it at the beginning).
At this point, the problem has been initially solved, but there are still many details that we need to learn in depth to understand the deeper underlying knowledge.

4. The technical key points of Prometheus-Operator monitoring Etcd

Technical key points

How to install Etcd

The Etcd installed by KubeSphere is binary, and the verification method is as follows.

 ## 看进程确认是二进制方式
[root@ks-k8s-master-0 ~]# ps -ef | grep etcd
root      1158 56409  0 15:43 pts/0    00:00:00 grep --color=auto etcd
root     15301     1  6 Apr09 ?        15:35:08 /usr/local/bin/etcd
root     17247 17219 13 Apr09 ?        1-06:55:24 kube-apiserver --advertise-address=192.168.9.91 --allow-privileged=true --audit-log-maxage=30 --audit-log-maxbackup=10 --audit-log-maxsize=100 --authorization-mode=Node,RBAC --bind-address=0.0.0.0 --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/ssl/etcd/ssl/ca.pem --etcd-certfile=/etc/ssl/etcd/ssl/node-ks-k8s-master-0.pem --etcd-keyfile=/etc/ssl/etcd/ssl/node-ks-k8s-master-0-key.pem --etcd-servers=https://192.168.9.91:2379,https://192.168.9.92:2379,https://192.168.9.93:2379 --feature-gates=CSIStorageCapacity=true,RotateKubeletServerCertificate=true,TTLAfterFinished=true,ExpandCSIVolumes=true --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.233.0.0/18 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key

## 看 ssl 密钥文件有哪些
[root@ks-k8s-master-0 ~]# ll /etc/ssl/etcd/ssl/
total 80
-rw------- 1 root root 1675 Apr  9 22:32 admin-ks-k8s-master-0-key.pem
-rw-r--r-- 1 root root 1440 Apr  9 22:32 admin-ks-k8s-master-0.pem
-rw------- 1 root root 1679 Apr  9 22:32 admin-ks-k8s-master-1-key.pem
-rw-r--r-- 1 root root 1440 Apr  9 22:32 admin-ks-k8s-master-1.pem
-rw------- 1 root root 1679 Apr  9 22:32 admin-ks-k8s-master-2-key.pem
-rw-r--r-- 1 root root 1440 Apr  9 22:32 admin-ks-k8s-master-2.pem
-rw------- 1 root root 1675 Apr  9 22:32 ca-key.pem
-rw-r--r-- 1 root root 1086 Apr  9 22:32 ca.pem
-rw------- 1 root root 1679 Apr  9 22:32 member-ks-k8s-master-0-key.pem
-rw-r--r-- 1 root root 1440 Apr  9 22:32 member-ks-k8s-master-0.pem
-rw------- 1 root root 1675 Apr  9 22:32 member-ks-k8s-master-1-key.pem
-rw-r--r-- 1 root root 1440 Apr  9 22:32 member-ks-k8s-master-1.pem
-rw------- 1 root root 1675 Apr  9 22:32 member-ks-k8s-master-2-key.pem
-rw-r--r-- 1 root root 1440 Apr  9 22:32 member-ks-k8s-master-2.pem
-rw------- 1 root root 1675 Apr  9 22:32 node-ks-k8s-master-0-key.pem
-rw-r--r-- 1 root root 1440 Apr  9 22:32 node-ks-k8s-master-0.pem
-rw------- 1 root root 1679 Apr  9 22:32 node-ks-k8s-master-1-key.pem
-rw-r--r-- 1 root root 1440 Apr  9 22:32 node-ks-k8s-master-1.pem
-rw------- 1 root root 1679 Apr  9 22:32 node-ks-k8s-master-2-key.pem
-rw-r--r-- 1 root root 1440 Apr  9 22:32 node-ks-k8s-master-2.pem

Prometheus-Operator monitors the configuration of Etcd

Generate secret with certificate from external Etcd
Generate endpoint with the ip of each node of external Etcd
Generate etcd service using Endpoint
Generate ServiceMonitor for scraping Etcd data

Places that need in-depth study (occupancy, to be added)

The implementation principle and technical details of Prometheus-Operator.
The configuration process of KubeSphere for Prometheus-Operator.

5. Summary

According to the actual needs of operation and maintenance, this article introduces the correct posture to enable Etcd monitoring, and also introduces the troubleshooting process to solve this problem in detail. For those who need to enable the Etcd monitoring function of KubeSphere 3.2.1, you can refer to this article for configuration.

Reference documentation

Get Document

Github https://github.com/devops/z-notes
Gitee https://gitee.com/zdevops/z-notes

Get code

Station B

Old Z's Notes

This article is published by OpenWrite , a multi-post blog platform!

Monitoring Etcd outside the cluster in KubeSphere

1. Introduction to this article

2. KubeSphere CRD enables Etcd monitoring

3. Problem solving process record

4. The technical key points of Prometheus-Operator monitoring Etcd

Technical key points

Places that need in-depth study (occupancy, to be added)

5. Summary

Reference documentation

Get Document

Get code

Station B

KubeSphere

引用和评论

“智启云原生・粤览 AI 未来”｜云原生 AI Meetup 深圳站精彩落幕

云电竞巅峰对决：ToDesk/网易云/START实战测评，谁是真王者？

国产化环境下的 K8s 全离线部署：鲲鹏 + 麒麟 V10 + KubeSphere + Harbor

助力资本与创新协同——2025年民营科技企业投贷融合赋能行动在蓉举行

Higress 开源 Remote MCP Server 托管方案，并将上线 MCP 市场

实时云渲染：颠覆传统工作流的五大核心优势

什么是云互联网