Author: Zhang Yanying (Old Z), operation and maintenance architect of Shandong Branch of Telecom System Integration Company, cloud native enthusiast, currently focusing on cloud native operation and maintenance.

1. Introduction to this article

This article originates from the Ectd monitoring mentioned by a small partner @Jam in the KubeSphere open source community 8 group. I hope I can help you to take a look. Originally, I didn't enable Etcd monitoring, but since my friends trusted me so much and made a request, it must be arranged. Hence this article.

After research, it is found that there is an Etcd monitoring page displayed in the cluster status monitoring that comes with KubeSphere. However, in KubeSphere 3.2.1 version, after the default configuration enables Etcd monitoring, the Etcd monitoring page in the cluster status does not have any data. This article will document the troubleshooting journey to resolve this issue.

Knowledge points of this article

  • Rating: entry level
  • Prometheus-Operator
  • KubeSphere enables Etcd monitoring

Demo server configuration

CPU name IP CPU RAM system disk data disk use
zdeops-master 192.168.9.9 2 4 40 200 Ansible operation and maintenance control node
ks-k8s-master-0 192.168.9.91 8 32 40 200 KubeSphere/k8s-master/k8s-worker
ks-k8s-master-1 192.168.9.92 8 32 40 200 KubeSphere/k8s-master/k8s-worker
ks-k8s-master-2 192.168.9.93 8 32 40 200 KubeSphere/k8s-master/k8s-worker
glusterfs-node-0 192.168.9.95 4 8 40 200 GlusterFS/ElasticSearch
glusterfs-node-1 192.168.9.96 4 8 40 200 GlusterFS/ElasticSearch
glusterfs-node-2 192.168.9.97 4 8 40 200 GlusterFS/ElasticSearch

2. KubeSphere CRD enables Etcd monitoring

  1. Edit the YAML configuration file for ks-installer in the CRD .

    In the YAML file, search for etcd and change monitoring from false to true.

     etcd:
      endpointIps: '192.168.9.91,192.168.9.92,192.168.9.93'
      monitoring: true
      port: 2379
      tlsEnable: true
  2. After all configurations are complete, click OK in the lower right corner to save the configuration.
  3. Execute the following command in kubectl to check the installation process.

     kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f

    Results are not displayed.

  4. Verify the installation results.

    Log in to the console, Platform Management- > Cluster Management- > Monitoring Alarms- > Cluster Status , and check whether the etcd monitoring tab exists. If it exists, the monitoring is successfully enabled.

  5. Although the previous configuration is enabled, the monitoring data does not exist at this time. At the same time, checking the Pod of prometheus-k8s will find the following error.

  6. Next, we will explain why and how to configure it.

3. Problem solving process record

  1. Find the official forum, the keyword uses etcd to find the following document that looks relatively close, open it and take a look.

    etcd uses a self-signed certificate, prometheus reports an error issued by an unknown authority #2.11

    However, there is no detailed problem solving process in the document. I am confused, but I have obtained very important configuration steps.

  2. According to the key point 1 obtained above, generate the secret with the certificate of the external etcd .

    This command is to generate a secret configuration based on etcd's cert.

     # kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs --from-file=etcd-client-ca.crt=/etc/ssl/etcd/ssl/ca.pem --from-file=etcd-client.crt=/etc/ssl/etcd/ssl/admin-i-ezjb7gsk.pem --from-file=etcd-client.key=/etc/ssl/etcd/ssl/admin-i-ezjb7gsk-key.pem

    Don't worry, first see if the secret exists, and if not, generate it according to the command.

     [root@ks-k8s-master-0 ~]# kubectl get secrets -n kubesphere-monitoring-system
    NAME                                         TYPE                                  DATA   AGE
    additional-scrape-configs                    Opaque                                1      9d
    alertmanager-main                            Opaque                                1      9d
    alertmanager-main-generated                  Opaque                                1      9d
    alertmanager-main-tls-assets                 Opaque                                0      9d
    alertmanager-main-token-7b9xc                kubernetes.io/service-account-token   3      9d
    default-token-tnxh7                          kubernetes.io/service-account-token   3      9d
    kube-etcd-client-certs                       Opaque                                3      9d
    kube-state-metrics-token-czbrg               kubernetes.io/service-account-token   3      9d
    node-exporter-token-qrhl7                    kubernetes.io/service-account-token   3      9d
    notification-manager-sa-token-lc6z4          kubernetes.io/service-account-token   3      9d
    notification-manager-webhook-server-cert     kubernetes.io/tls                     2      9d
    prometheus-k8s                               Opaque                                1      9d
    prometheus-k8s-tls-assets                    Opaque                                0      9d
    prometheus-k8s-token-7fk45                   kubernetes.io/service-account-token   3      9d
    prometheus-operator-token-wlmcf              kubernetes.io/service-account-token   3      9d
    sh.helm.release.v1.notification-manager.v1   helm.sh/release.v1                    1      9d

    Actually found kube-etcd-client-certs .

    Looking at the specific content, I found that there are all of them, and there are not many.

     [root@ks-k8s-master-0 ~]# kubectl get secrets -n kubesphere-monitoring-system kube-etcd-client-certs -o yaml
    apiVersion: v1
    data:
      etcd-client-ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM5VENDQWQyZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFTTVJBd0RnWURWUVFERXdkbGRHTmsKTFdOaE1CNFhEVEl5TURRd09URTBNekl5TjFvWERUTXlNRFF3TmpFME16SXlOMW93RWpFUU1BNEdBMVVFQXhNSApaWFJqWkMxallUQ0NBU0l3RFFZSktvWklodmNOQVFFQkJRQURnZ0VQQURDQ0FRb0NnZ0VCQU53SnpobDFPSVpyCkZYOUNsbER3czVVdnA5NkxHOHpxWkZGbmRGZVBlb1RrTXlFSVpESFRQM0lYSFhzaFFPNjF3VlpVd3VvMmJoeTcKdTBLbEFUcXZmZ1ZJTWE2MlpKTFVNcGwrendvMnFDcWpzbHd1b3RacHArTHVYaldYRTFOeWcwWi9MRmd3NDArOQpGSDV3Y2VWK0FhNjhETElKQWw4a0l6VktScVgraENjZGVTOFRWbDNVeS9PMWRkRFJGODExYzB6VTNteEF2Z0h5CmlxOFF0S2dBQ3E0L294N3RPRFRZUVNlVVdOa25tZTBLMituWmR6M1RveHpUamdIZ2FDVlFXVW5nNFNyMVlSYWwKV2owTGlET2tWb2l3TlFrSVd6ZnBrVXUrM2RJUGNPL29Wc0E3eEJLenhGdEp2dmthTGU1ZDd6a3p2d2xVdE1NYgp2NzNzNERqNU0yc0NBd0VBQWFOV01GUXdEZ1lEVlIwUEFRSC9CQVFEQWdLa01BOEdBMVVkRXdFQi93UUZNQU1CCkFmOHdIUVlEVlIwT0JCWUVGREh3WUNYcW90OG9oYWNZa1FBaHMrRjNSWW5tTUJJR0ExVWRFUVFMTUFtQ0IyVjAKWTJRdFkyRXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBS3l3SEJpVEkxYjExQjNrTDJNZFN0WGRaZ2ZNT05obApuZ1QyUjVuQWZISUVTZVRGNnpFbWh6QnBRb3ozMm1GbG1VdlRKMjdhdVk4UGh2cC9pT0pKbWZIZnY3RWcyYVpJCmlkK2w5YTJoQXFrMnVnNmV4NFpjUzgvOUxyTUV3SlhDOGZqeTA0OWdLQjIyMXFuSFh0Q3VyNE95MUFyMHBiUUwKaEQ4T0lpaExBbHpZNnIvQTlzVDYrNU12cy80OE5LeWN0Sy9KYzFhbVVQK0tnWXlPWDNWNXVsM096MFpIT2ptRAo5akIrdlNHUHM5REdrdnJEeFp4SDRIM0NhaTF5cHBlc29YVFZndS81UTFjcVlvdGNJalZpekx5eVNjZ1EzQ2ZqCmVvdnk3NW8vZUdiRmpYSmJQV0NncDhYV2RJWkVmcmNXMXZtWjZPZDVmcXIwblY5QVExekhueWs9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
      etcd-client.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUQrVENDQXVHZ0F3SUJBZ0lJT2Y3Ky90T3NYa013RFFZSktvWklodmNOQVFFTEJRQXdFakVRTUE0R0ExVUUKQXhNSFpYUmpaQzFqWVRBZUZ3MHlNakEwTURreE5ETXlNamRhRncwek1qQTBNRFl4TkRNeU16QmFNQ1F4SWpBZwpCZ05WQkFNVEdXVjBZMlF0Ym05a1pTMXJjeTFyT0hNdGJXRnpkR1Z5TFRBd2dnRWlNQTBHQ1NxR1NJYjNEUUVCCkFRVUFBNElCRHdBd2dnRUtBb0lCQVFDN0NvS1dWKzJKeXRVRTc2VnhvU3lOZzZXOU4yRUlxaTA5UkQ3TThTYUMKZzNHSFZJcXRjWUZzWEhNSHNGeGkyc0ltRWdTblRQMU1sS2Y2Q2xoZ1llSUJqbHJjdWVGNzNDUW45dkw3bXdqMwpJVzV0cUJ4Z1BwRmpvc1FQcGs5eU5XWmpEVGJsbHJTbkZjTXNKekFEOXNIZjdiRWUrQTZJcnJDUnhLZGJWaVY1CnFveFR5THhJenF4c2NDMlMwclJCYk5YbHAzZFU1QStldGZhOUYxUFNCeDQxdmk1MXcvTnBVRkNOa2ZuaWhyZnUKcUVoYW0zNUdCbFYrRzd4ZENSVGt6K3h3V3IwdnhMUitueGZ5MElHL2hyYlIxL0RLbHo5Y3BnbHhTWUg5S3ZvbgpzVXRpemhQYXVsRFZIN2NFdTJGOWZuTHZlK2hZemt3c3hhS1RsQTFlQ2VEeEFnTUJBQUdqZ2dFL01JSUJPekFPCkJnTlZIUThCQWY4RUJBTUNCYUF3SFFZRFZSMGxCQll3RkFZSUt3WUJCUVVIQXdFR0NDc0dBUVVGQndNQ01Bd0cKQTFVZEV3RUIvd1FDTUFBd0h3WURWUjBqQkJnd0ZvQVVNZkJnSmVxaTN5aUZweGlSQUNHejRYZEZpZVl3Z2RvRwpBMVVkRVFTQjBqQ0J6NElFWlhSalpJSVFaWFJqWkM1cmRXSmxMWE41YzNSbGJZSVVaWFJqWkM1cmRXSmxMWE41CmMzUmxiUzV6ZG1PQ0ltVjBZMlF1YTNWaVpTMXplWE4wWlcwdWMzWmpMbU5zZFhOMFpYSXViRzlqWVd5Q0QydHoKTFdzNGN5MXRZWE4wWlhJdE1JSVBhM010YXpoekxXMWhjM1JsY2kweGdnOXJjeTFyT0hNdGJXRnpkR1Z5TFRLQwpFMnhpTG10MVltVnpjR2hsY21VdWJHOWpZV3lDQ1d4dlkyRnNhRzl6ZEljRWZ3QUFBWWNRQUFBQUFBQUFBQUFBCkFBQUFBQUFBQVljRXdLZ0pXNGNFd0tnSlhJY0V3S2dKWFRBTkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQXZOR2gKdHdlTG1QS2F2YjVhOFoxU2sxQkFZdzZ6dEdHTnJGdzg2M1dKRVBEblFFa3duOFhJNGh4SU82UVV3eHJic1MweAp0YUg2ZmRKeFZZcEN5UXVrV3JldHpkZ05zMTVWYnlNdUlqVkJRMytGZnBRaDB5T25tUXlmRWc2UWZNdU5IWGpJCjZCdVp5M0p0S0tFZGZmUFh4U3VlMFV2TG5idlN6U0tVQkRIcy9nNVV0Q3cyeHVIVFU5bFdoQXY2dm1WQ08yQW4KZmc2MjAzMUpUNG9ya2F6c1hmdENOTlZqUmdIZ2pjQ0NDZkMwY1hSRVZTVFZqZUFaZU40ZUdtYWlRcFdEUWkxbApUVWZJMlE0dGRySlFsOXk0dDNKRDgrSmFLT0VJWkt3NWVWaTc3cUZobWR1MmFkRThkODc0aVBnN2ZEYmVFS2tWCkYxVWVKb3NKOFN3Z1psWTRpQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
      etcd-client.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb3dJQkFBS0NBUUVBdXdxQ2xsZnRpY3JWQk8rbGNhRXNqWU9sdlRkaENLb3RQVVErelBFbWdvTnhoMVNLCnJYR0JiRnh6QjdCY1l0ckNKaElFcDB6OVRKU24rZ3BZWUdIaUFZNWEzTG5oZTl3a0ovYnkrNXNJOXlGdWJhZ2MKWUQ2Ulk2TEVENlpQY2pWbVl3MDI1WmEwcHhYRExDY3dBL2JCMysyeEh2Z09pSzZ3a2NTblcxWWxlYXFNVThpOApTTTZzYkhBdGt0SzBRV3pWNWFkM1ZPUVBuclgydlJkVDBnY2VOYjR1ZGNQemFWQlFqWkg1NG9hMzdxaElXcHQrClJnWlZmaHU4WFFrVTVNL3NjRnE5TDhTMGZwOFg4dENCdjRhMjBkZnd5cGMvWEtZSmNVbUIvU3I2SjdGTFlzNFQKMnJwUTFSKzNCTHRoZlg1eTczdm9XTTVNTE1XaWs1UU5YZ25nOFFJREFRQUJBb0lCQVFDamQ1c0x4SXNRMjFsegpOL0xUTFhhZnM0ZmRxQkhCSGVIdDRzQTBJeXB4OUdqN1NwTHM1UCtrOGVPQ3U4cnlocGdaNTdOemVDRUVsZ044Cnp4L1FGSndPbWhpbFFqdGtJZERqc0x0SjFJUndZQ0ovNmVYcTQ2UHpmV1IyL1BZQUxkVnZDalNKVVQ1UHJRQm4KalZRMGtxdDhodU0rMnJMeEdDT3ZNanpGNGJOYzhZZGFSOTI0c095Y1Q2UzI1Vzg3TklQWnVqY3VBUXIzaEE2bwpUbEdmVU44Q0hSM21jVnBIbEJ1NDhEeEpYaml2MkVKZTRHSmN2L0NWQTVqVGNNNlNoTjJuSGN3OGpHYVg0bGJtCjJYaktKemE0RStON3hGRXBRVEJRMUNqRGM1cndKY0tKUm9IQkxFUGtJVE5LWnNWSDlmK0tuNmpjQWtmOTZoWVkKKzY1TTMza1ZBb0dCQU5GMVdRNG4wcTE0YlpSY1FkbnFoWDdYT0pFbDBtOUZuYVhOTjNsb0M1SnNneGxkbXh5bgpRV1IvZkJVQnRaTUc5MmgzdTBheWUyaWdZdGtSc1pDV0wwL2VicmJGMWlmYXozR2Z1b3lSZWozMHVsRDJYY3phCmQzSEUwdVpTSVQrUkFSTTF1VjJUczVUSHJqUStIT3Z5cEpFQjFlSnY1L21LWmRpUTRtMzBGMDUzQW9HQkFPU1oKL21NWXd4V1Y4SFRtaENyNGsycDJQd1NLTHVrajhZaVJQZHhVSFpXWXdRTGFFRU1uSVVnUFJBSnFHc1VtWng5TApacDVjYXp3bW9ldDI0cXpGeVhkemFUMi96VGc1Rjg1d0FzRDl1WEZSWWYzc01OZ0VkazJkSmc1VGZmcWcrNlRQCjBla2VtWG9vSTYxTTc3VVFjWVdSVCtPWUtFd1V3dzZMcjJ3bGFKM1hBb0dBTzF3alVlU3RTeVllLy9XcFgrV2IKMFplUzIyZTVuSGxCTlRUVWJONjBzTmw1eWQyQ1VQdUJoOGF0VnBLMmI2V0F4aVZ3ZUplcWE3dFFhQzRnZ1ZaZQpzQ2JjZjRYUHJGblJnbVQvREVsS09IYTd1cWduYXgvYXkrNDR5cmNwM3dic0pCS01wdDF0L2xNY3BvZVgwTEppCk93b25JRllRaXVMUy9DNExUWmZvWnY4Q2dZQnIrMlhUajRYUE0zVlM4dlJwaStPdWZVNkZLWFRCUWU0OHNVYkUKUmFOMzM2RUVaTmNic1djaUw3dlRYQ1ZyRFJuWENYbmV3ZzhSYWJwQWpIYkVYK1VybklPUTNJSG0xZWt0NVhFWAprb0kvU2M3ODc4MmVySFRwY3ByZ1Y0WUJsbnRudlpjTkJCeEJQS2Fsbk5yNTcxdUFXVVNnWUdaZ2tjb1ZtOXZ3ClBMZHZId0tCZ0dYS2l5Y29zZzFuZHhkclQ0S05SSmdWZUd1M3ZqSjg4N0tQbThpbHB4alF3ekM2cjNRZDhYUWIKbGdWUnFBcG5mTnA1amM0WUZ5c2RvKzFhc2JrRTloczVUZk5sVUVtSWdvR3dxVnlmUkRiOEl0TklRQTBXZDZLdQpONy81UkZYRVlkUFR4YVhpNjl0cTZnRXp6cThTcnQyUUY5eEk5eG1EV0U5bGVEeDUwd1dZCi0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg==
    kind: Secret
    metadata:
      creationTimestamp: "2022-04-09T14:34:37Z"
      name: kube-etcd-client-certs
      namespace: kubesphere-monitoring-system
      resourceVersion: "856"
      uid: c74b122b-438d-4e40-8e1a-1b9445d4b3d5
    type: Opaque

    Seeing this means that secrets looks okay for the time being. At least the resource configuration exists. Let's continue to check later, and come back if it doesn't work.

    In the process of writing the document, Jam reported that his environment does not have this secrets resource configuration. A secret can be generated by following the command above, taking care to check the actual path of the Etcd key.
  3. According to the key point 2 obtained above, use the ip of each node of the external etcd to generate the endpoint .

    Let's take a look at what the prometheus-endpointsEtcd.yaml file is.

    prometheus-endpointsEtcd.yaml

     apiVersion: v1
    kind: Endpoints
    metadata:
      labels:
        k8s-app: etcd
      name: etcd
      namespace: kube-system
    subsets:
    - addresses:
      - ip: 127.0.0.1
      ports:
      - name: metrics
        port: 2379
        protocol: TCP

    Let's see if there are Endpoints resources in our kubernetes.

     [root@ks-k8s-master-0 ~]# kubectl get endpoints -n kubesphere-monitoring-system
    NAME                                      ENDPOINTS                                                                AGE
    alertmanager-main                         10.233.116.11:9093,10.233.117.10:9093,10.233.87.9:9093                   9d
    alertmanager-operated                     10.233.116.11:9094,10.233.117.10:9094,10.233.87.9:9094 + 6 more...       9d
    kube-state-metrics                        10.233.87.8:8443,10.233.87.8:9443                                        9d
    node-exporter                             192.168.9.91:9100,192.168.9.92:9100,192.168.9.93:9100                    9d
    notification-manager-controller-metrics   10.233.116.8:8443                                                        9d
    notification-manager-svc                  10.233.116.13:19093,10.233.116.14:19093                                  9d
    notification-manager-webhook              10.233.116.8:9443                                                        9d
    prometheus-k8s                            10.233.117.43:9090,10.233.87.160:9090                                    9d
    prometheus-operated                       10.233.117.43:9090,10.233.87.160:9090                                    9d
    prometheus-operator                       10.233.116.7:8443                                                        9d
    thanos-ruler-operated                     10.233.117.18:10902,10.233.87.17:10902,10.233.117.18:10901 + 1 more...   8d

    There is no Endpoints related to Etcd , need to create a new one?

    When I was about to recreate it based on the configuration file, I suddenly discovered my own mistake, inertial thinking, and was led by the above command, and used the wrong command space. The command space of the configuration file instance is kube-system .

    Query in kube-system again to find the resource configuration we want.

     [root@ks-k8s-master-0 ~]# kubectl get endpoints -n kube-system
    NAME                          ENDPOINTS                                                              AGE
    coredns                       10.233.117.2:53,10.233.117.3:53,10.233.117.2:53 + 3 more...            9d
    etcd                          192.168.9.91:2379,192.168.9.92:2379,192.168.9.93:2379                  3d20h
    kube-controller-manager-svc   192.168.9.91:10257,192.168.9.92:10257,192.168.9.93:10257               9d
    kube-scheduler-svc            192.168.9.91:10259,192.168.9.92:10259,192.168.9.93:10259               9d
    kubelet                       192.168.9.91:10250,192.168.9.92:10250,192.168.9.93:10250 + 6 more...   9d
    openebs.io-local              <none>                                                                 9d

    Take a look at the configuration file content.

     [root@ks-k8s-master-0 ~]# kubectl get endpoints etcd -n kube-system -o yaml
    apiVersion: v1
    kind: Endpoints
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: |
          {"apiVersion":"v1","kind":"Endpoints","metadata":{"annotations":{},"labels":{"k8s-app":"etcd"},"name":"etcd","namespace":"kube-system"},"subsets":[{"addresses":[{"ip":"192.168.9.91"},{"ip":"192.168.9.92"},{"ip":"192.168.9.93"}],"ports":[{"name":"metrics","port":2379,"protocol":"TCP"}]}]}
      creationTimestamp: "2022-04-15T08:24:18Z"
      labels:
        k8s-app: etcd
      name: etcd
      namespace: kube-system
      resourceVersion: "1559305"
      uid: c6d0ee2c-a228-4ea8-8ef1-73b387030950
    subsets:
    - addresses:
      - ip: 192.168.9.91
      - ip: 192.168.9.92
      - ip: 192.168.9.93
      ports:
      - name: metrics
        port: 2379
        protocol: TCP

    The configuration file looks correct, so let's continue to check.

  4. According to the key point 3 obtained above, generate the etcd service using the above endpoint

    Let's take a look at what the prometheus-serviceEtcd.yaml file is.

    prometheus-serviceEtcd.yaml

     apiVersion: v1
    kind: Service
    metadata:
      labels:
        k8s-app: etcd
      name: etcd
      namespace: kube-system
    spec:
      clusterIP: None
      ports:
      - name: metrics
        port: 2379
        targetPort: 2379
      selector: null

    Let's see if there are any Service resources in our kubernetes.

     [root@ks-k8s-master-0 ~]# kubectl get service -n kube-system
    NAME                          TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                        AGE
    coredns                       ClusterIP   10.233.0.3   <none>        53/UDP,53/TCP,9153/TCP         9d
    etcd                          ClusterIP   None         <none>        2379/TCP                       3d21h
    kube-controller-manager-svc   ClusterIP   None         <none>        10257/TCP                      9d
    kube-scheduler-svc            ClusterIP   None         <none>        10259/TCP                      9d
    kubelet                       ClusterIP   None         <none>        10250/TCP,10255/TCP,4194/TCP   9d

    See the resource configuration details.

     [root@ks-k8s-master-0 ~]# kubectl get service etcd -n kube-system -o yaml
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: |
          {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"k8s-app":"etcd"},"name":"etcd","namespace":"kube-system"},"spec":{"clusterIP":"None","ports":[{"name":"metrics","port":2379,"targetPort":2379}],"selector":null}}
      creationTimestamp: "2022-04-15T08:24:18Z"
      labels:
        k8s-app: etcd
      name: etcd
      namespace: kube-system
      resourceVersion: "1559307"
      uid: cfd92ee5-dbd1-4ee4-a4c4-d683ca7a41ea
    spec:
      clusterIP: None
      clusterIPs:
      - None
      ipFamilies:
      - IPv4
      - IPv6
      ipFamilyPolicy: RequireDualStack
      ports:
      - name: metrics
        port: 2379
        protocol: TCP
        targetPort: 2379
      sessionAffinity: None
      type: ClusterIP
    status:
      loadBalancer: {}

    The configuration file looks correct, so let's continue to check.

  5. According to the key point 4 obtained above, generate ServiceMonitor for grabbing Etcd data

    Let's take a look at what the prometheus-serviceMonitorEtcd.yaml file is.

    prometheus-serviceMonitorEtcd.yaml

     apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: etcd
      name: etcd
      namespace: kubesphere-monitoring-system
    spec:
      endpoints:
      - interval: 1m
        port: metrics
        scheme: https
        tlsConfig:
          caFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt
          certFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt
          keyFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key
          serverName: etcd.kube-system.svc.cluster.local
      jobLabel: k8s-app
      namespaceSelector:
        matchNames:
        - kube-system
      selector:
        matchLabels:
          k8s-app: etcd

    Let's see if there is a ServiceMonitor resource in our Kubernetes.

     [root@ks-k8s-master-0 ~]# kubectl get servicemonitor -n kubesphere-monitoring-system
    NAME                      AGE
    alertmanager              9d
    coredns                   9d
    devops-jenkins            8d
    etcd                      3d21h
    kube-apiserver            9d
    kube-controller-manager   9d
    kube-scheduler            9d
    kube-state-metrics        9d
    kubelet                   9d
    node-exporter             9d
    prometheus                9d
    prometheus-operator       9d
    s2i-operator              8d

    See the resource configuration details.

     [root@ks-k8s-master-0 ~]# kubectl get servicemonitor etcd -n kubesphere-monitoring-system -o yaml
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: |
          {"apiVersion":"monitoring.coreos.com/v1","kind":"ServiceMonitor","metadata":{"annotations":{},"labels":{"app.kubernetes.io/vendor":"kubesphere","k8s-app":"etcd"},"name":"etcd","namespace":"kubesphere-monitoring-system"},"spec":{"endpoints":[{"interval":"1m","port":"metrics","scheme":"https","tlsConfig":{"caFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt","certFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt","keyFile":"/etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key"}}],"jobLabel":"k8s-app","namespaceSelector":{"matchNames":["kube-system"]},"selector":{"matchLabels":{"k8s-app":"etcd"}}}}
      creationTimestamp: "2022-04-15T08:24:18Z"
      generation: 1
      labels:
        app.kubernetes.io/vendor: kubesphere
        k8s-app: etcd
      name: etcd
      namespace: kubesphere-monitoring-system
      resourceVersion: "1559308"
      uid: 386f16c0-74cd-4dbf-aa35-cc227062c881
    spec:
      endpoints:
      - interval: 1m
        port: metrics
        scheme: https
        tlsConfig:
          caFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt
          certFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.crt
          keyFile: /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client.key
      jobLabel: k8s-app
      namespaceSelector:
        matchNames:
        - kube-system
      selector:
        matchLabels:
          k8s-app: etcd

    The configuration file looks correct, so let's continue to check.

  6. So far, I found that I have checked everything I can check, and there are all the necessary configurations. So why is there still a problem? There is no more detailed explanation in the reference document.
  7. At this time, I found that I forgot a little, and I have not read the Pod's log, so I quickly went to see it.

    In Cluster Management -> Application Load -> Workload -> Stateful Replica Set , select the kubesphere-monitoring-system project and find prometheus-k8s .

    Click prometheus-k8s to enter the detailed page, and click the prometheus-k8s-0 container in the container group.

    Click the button Container Log to pop up the Container Log page.

    At this time, you will find a large number of error logs

    Detailed error log.

     level=error ts=2022-04-19T06:49:08.169Z caller=manager.go:188 component="scrape manager" msg="error creating nescrape pool" err="error creating HTTP client: unable to load specified CA cert /etc/prometheus/secrets/kube-etcclient-certs/etcd-client-ca.crt: open /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt: no sucfile or directory" scrape_pool=kubesphere-monitoring-system/etcd/0

    Seeing this we found the cause of the problem, the file /etc/prometheus/secrets/kube-etcd-client-certs/etcd-client-ca.crt could not be found.

    Open the terminal of the Pod and enter the system to verify.

    As it turns out, the entire folder doesn't exist.

    Take another look at the Pod configuration to see if there is a secrets configuration.

    Seeing this, it's real , I think I found the root of the problem, and I also thought of a solution to the problem, that is, the secrets of kube-etcd-client-certs are not mounted in the Pod, then we try to mount it. Problem solved? ? ?

    In the console, find our stateful replica set prometheus-k8s , click More Actions -> Edit Settings .

    In the storage volume , mount the configuration dictionary or the privacy dictionary .

    Select the secret dictionary , mount kube-etcd-client-certs to /etc/prometheus/secrets/kube-etcd-client-certs read-only , and finalize.

    After clicking OK, you will find that the Pod starts to rebuild, I thought it was ok and waited to see the effect and it was over, the result. . .

    After the Pod was successfully rebuilt, I thought everything was under my control, and there was definitely no problem. As a result, I found that the modified configuration has changed back to the original one. The secrets we want are not mounted in the Pod at all, and the configuration is the same as before.

    After repeating the operation three times, I collapsed, and I realized that the method I changed was wrong. This is done by prometheus-operator , and the configuration will not take effect if I modify it alone.

  8. prometheus-operator , I haven't played this thing before, I don't know the technical details, what should I do. . . Continue to Baidu.
  9. Baidu.

    Keyword prometheus operator etcd .

    The first one took a look, it didn't help, I didn't show it, you can watch it yourself if you are interested
    After 2 minutes, I opened the second-ranked article. The idea of the article was relatively clear. I quickly scrolled down and found the method I wanted when I saw the third point.

  10. I don't know the details, but our purpose is to mount secrets. Since it is mentioned here, let's try it.

     [root@ks-k8s-master-0 ~]# kubectl edit prometheuses -n kubesphere-monitoring-system
     # Please edit the object below. Lines beginning with a '#' will be ignored,
    # and an empty file will abort the edit. If an error occurs while saving this file will be
    # reopened with the relevant failures.
    #
    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      annotations:
        kubectl.kubernetes.io/last-applied-configuration: |
    ....

    The content of the file is similar to the above, we search for secret, and an error E486: Pattern not found: secret appears.

    It means that there is no secret configuration in the default configuration, we add it ourselves, and add it around line 78 of the file.

     secrets:
    - kube-etcd-client-certs

    The final effect is similar (I added line numbers for clarity):

     71   securityContext:
    72     fsGroup: 0
    73     runAsNonRoot: false
    74     runAsUser: 0
    75   serviceAccountName: prometheus-k8s
    76   serviceMonitorNamespaceSelector: {}
    77   serviceMonitorSelector: {}
    78   secrets:
    79   - kube-etcd-client-certs
    80   storage:
    81     volumeClaimTemplate:
    82       spec:
    83         resources:
    84           requests:
    85             storage: 20Gi
    86   tolerations:
    87   - effect: NoSchedule
    88     key: dedicated
    89     operator: Equal
    90     value: monitoring
    91   version: v2.26.0

    Save and exit.

    When we check the configuration of the stateful replica set again, we will find that there is an additional configuration of the secret dictionary.

    Looking at the specific configuration of the Pod, you will find that the configuration of the Pod also includes the configuration of the secret dictionary.

    Looking at the days of the Pod again, I found that there was no error .

    It feels that the problem has been solved, so let's see if the monitoring has graphics ( there is still a little expectation ).

  11. The moment when the final answer is revealed.

    Take a panorama first.

    A few more local high-definition pictures (supplemented later, I didn't capture it at the beginning).

  12. At this point, the problem has been initially solved, but there are still many details that we need to learn in depth to understand the deeper underlying knowledge.

4. The technical key points of Prometheus-Operator monitoring Etcd

Technical key points

  1. How to install Etcd

    The Etcd installed by KubeSphere is binary, and the verification method is as follows.

     ## 看进程确认是二进制方式
    [root@ks-k8s-master-0 ~]# ps -ef | grep etcd
    root      1158 56409  0 15:43 pts/0    00:00:00 grep --color=auto etcd
    root     15301     1  6 Apr09 ?        15:35:08 /usr/local/bin/etcd
    root     17247 17219 13 Apr09 ?        1-06:55:24 kube-apiserver --advertise-address=192.168.9.91 --allow-privileged=true --audit-log-maxage=30 --audit-log-maxbackup=10 --audit-log-maxsize=100 --authorization-mode=Node,RBAC --bind-address=0.0.0.0 --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/ssl/etcd/ssl/ca.pem --etcd-certfile=/etc/ssl/etcd/ssl/node-ks-k8s-master-0.pem --etcd-keyfile=/etc/ssl/etcd/ssl/node-ks-k8s-master-0-key.pem --etcd-servers=https://192.168.9.91:2379,https://192.168.9.92:2379,https://192.168.9.93:2379 --feature-gates=CSIStorageCapacity=true,RotateKubeletServerCertificate=true,TTLAfterFinished=true,ExpandCSIVolumes=true --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-issuer=https://kubernetes.default.svc.cluster.local --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-account-signing-key-file=/etc/kubernetes/pki/sa.key --service-cluster-ip-range=10.233.0.0/18 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    
    ## 看 ssl 密钥文件有哪些
    [root@ks-k8s-master-0 ~]# ll /etc/ssl/etcd/ssl/
    total 80
    -rw------- 1 root root 1675 Apr  9 22:32 admin-ks-k8s-master-0-key.pem
    -rw-r--r-- 1 root root 1440 Apr  9 22:32 admin-ks-k8s-master-0.pem
    -rw------- 1 root root 1679 Apr  9 22:32 admin-ks-k8s-master-1-key.pem
    -rw-r--r-- 1 root root 1440 Apr  9 22:32 admin-ks-k8s-master-1.pem
    -rw------- 1 root root 1679 Apr  9 22:32 admin-ks-k8s-master-2-key.pem
    -rw-r--r-- 1 root root 1440 Apr  9 22:32 admin-ks-k8s-master-2.pem
    -rw------- 1 root root 1675 Apr  9 22:32 ca-key.pem
    -rw-r--r-- 1 root root 1086 Apr  9 22:32 ca.pem
    -rw------- 1 root root 1679 Apr  9 22:32 member-ks-k8s-master-0-key.pem
    -rw-r--r-- 1 root root 1440 Apr  9 22:32 member-ks-k8s-master-0.pem
    -rw------- 1 root root 1675 Apr  9 22:32 member-ks-k8s-master-1-key.pem
    -rw-r--r-- 1 root root 1440 Apr  9 22:32 member-ks-k8s-master-1.pem
    -rw------- 1 root root 1675 Apr  9 22:32 member-ks-k8s-master-2-key.pem
    -rw-r--r-- 1 root root 1440 Apr  9 22:32 member-ks-k8s-master-2.pem
    -rw------- 1 root root 1675 Apr  9 22:32 node-ks-k8s-master-0-key.pem
    -rw-r--r-- 1 root root 1440 Apr  9 22:32 node-ks-k8s-master-0.pem
    -rw------- 1 root root 1679 Apr  9 22:32 node-ks-k8s-master-1-key.pem
    -rw-r--r-- 1 root root 1440 Apr  9 22:32 node-ks-k8s-master-1.pem
    -rw------- 1 root root 1679 Apr  9 22:32 node-ks-k8s-master-2-key.pem
    -rw-r--r-- 1 root root 1440 Apr  9 22:32 node-ks-k8s-master-2.pem
  2. Prometheus-Operator monitors the configuration of Etcd
  • Generate secret with certificate from external Etcd
  • Generate endpoint with the ip of each node of external Etcd
  • Generate etcd service using Endpoint
  • Generate ServiceMonitor for scraping Etcd data

Places that need in-depth study (occupancy, to be added)

  1. The implementation principle and technical details of Prometheus-Operator.
  2. The configuration process of KubeSphere for Prometheus-Operator.

5. Summary

According to the actual needs of operation and maintenance, this article introduces the correct posture to enable Etcd monitoring, and also introduces the troubleshooting process to solve this problem in detail. For those who need to enable the Etcd monitoring function of KubeSphere 3.2.1, you can refer to this article for configuration.

Reference documentation

Get Document

Get code

Station B

This article is published by OpenWrite , a multi-post blog platform!

KubeSphere
124 声望58 粉丝

KubeSphere 是一个开源的以应用为中心的容器管理平台,支持部署在任何基础设施之上,并提供简单易用的 UI,极大减轻日常开发、测试、运维的复杂度,旨在解决 Kubernetes 本身存在的存储、网络、安全和易用性等痛...