Design and practice of Kubernetes cluster and application monitoring solution

Hello everyone, I'm Yan Zhenfan, a researcher at Microsoft's MVP Lab in this issue. In this article, I will introduce some knowledge about monitoring in Kubernetes, and how to use the mainstream Prometheus + Grafana solution to monitor and display data for the entire cluster.

Kubernetes monitoring

When your application is deployed to Kubenetes, it is difficult for you to see what happened inside the container. Once the container dies, the data in it may never be recovered, and you cannot even view the log to locate the problem, not to mention that there may be many instances of an application , a request from a user does not specify which container is handled, which complicates troubleshooting an application in Kubernetes. In addition to applications, since Kubernetes is the infrastructure that controls the life and death of the entire cluster, any failure of Kubernetes will definitely affect the operation of application services, so monitoring the running status of Kubernetes is also crucial.
When your application is cloud-native, you have to pay attention to the running status of each server, the running status of infrastructure and middleware, the running status of each component and resource object in Kubernetes, and the running status of each application. Of course, this running state is a vague concept, and depending on our focus, the "running state" to be expressed by each monitored object is different. In order to monitor the object we pay attention to, the object needs to make some cooperation to provide the expression information of the appropriate running state for us to collect and analyze, which can be called observability.

In cloud native, observability is generally divided into three scopes:

You can learn how to monitor, debug, and learn how to process logs in the Kubernetes documentation.
In this article, the monitoring mentioned, only includes Metrics.

Metrics, Tracing, and Logging are not completely independent. In the above figure, Metrics may also include Logging and Tracing information.

▌Monitoring objects

The monitoring data to be collected comes from the monitored objects, and in the Kubernetes cluster, we can divide the monitored objects into three parts:

Machine: All node machines in the cluster, with indicators such as CPU memory usage, network and hard disk IO rates, etc.;
Kubernetes object status: the status and some indicator information of Deployments, Pods, Daemonsets, Statefulset and other objects;
Application: Status or metrics for each container in the Pod, and possibly the /metrics endpoint provided by the container itself.

▌Prometheus

In the basic environment, a complete monitoring should include multiple parts, such as collecting data, storing data, analyzing and storing data, displaying data, and alarming, and each part has related tools or technologies to solve the diverse needs and requirements of cloud-native environments. complexity issues.

Since monitoring is to be done, monitoring tools are needed. Monitoring tools can take all the important metrics and logs (Metrics can also contain some logs) and store them in a secure, centralized location so they can be accessed anytime to develop solutions to problems. Since in cloud-native, applications are deployed in Kubernetes clusters, monitoring Kubernetes can give you insight into cluster health and performance metrics, resource counts, and a top-level overview of what's going on inside the cluster. When an error occurs, the monitoring tool will alert you (alert function) so that you can quickly roll out a fix.

Prometheus is a CNCF project that can natively monitor Kubernetes, nodes, and Prometheus itself. Currently, the official Kubernetes documentation mainly recommends using Prometheus itself, which provides out-of-the-box monitoring capabilities for the Kubernetes container orchestration platform. Therefore, in this paper, the design of the monitoring scheme is developed around Prometheus.

The following is an introduction to some components of Prometheus:

Metric Collection: Prometheus uses a pull model to retrieve metrics over HTTP. In the case that Prometheus cannot obtain metrics, you can choose to use Pushgateway to push metrics to Prometheus.
Metric Endpoint: Systems wishing to monitor with Prometheus should expose metrics for a /metric endpoint, which Prometheus utilizes to pull metrics at regular intervals.
PromQL: Prometheus ships with PromQL, a very flexible query language that can be used to query metrics in Prometheus dashboards. Additionally, Prometheus UI and Grafana will use PromQL queries to visualize metrics.
Prometheus Exporters: There are many libraries and servers that can help export existing metrics from third-party systems as Prometheus metrics. This is for situations in which a given system cannot be instrumented directly using Prometheus metrics.
TSDB (time-series database): Prometheus uses TSDB to store all data efficiently. By default, all data is stored locally. However, to avoid a single point of failure, prometheustsdb can optionally integrate remote storage.
The monitoring solution structure of Prometheus in Kubernetes is as follows:

[Image source: https://devopscube.com/setup-prometheus-monitoring-on-kubernetes/ ]

▌Indicators

There are many types of objects to be monitored. We call the same type of object an entity, and the data generated by the objects when each entity runs is various. In order to collect these data, Prometheus combines various attributes in the entity Values are divided into four types: Counter (counter), Gauge (dashboard), Histogram (cumulative histogram), and Summary (summary). Name container_cpu_usage_seconds_total record.

The general format of each indicator is:

 指标名称{元数据=值} 指标值

Each object generates data all the time. In order to distinguish which object the current indicator value belongs to, a large amount of metadata information can be attached to the indicator in addition to the indicator value, as shown in the following example.

 container_cpu_usage_seconds_total{
  beta_kubernetes_io_arch = "amd64",
    beta_kubernetes_io_os = "linux", 
    container = "POD", 
    cpu = "total", 
    id = "...", 
    image = "k8s.gcr.io/pause:3.5", 
    instance = "slave1", 
    job = "kubernetes-cadvisor", 
    kubernetes_io_arch = "amd64", 
    kubernetes_io_hostname = "slave1",
    kubernetes_io_os = "linux", 
    name = "k8s_POD_pvcpod_default_02ed547b-6279-4346-8918-551b87877e91_0", 
    namespace = "default", 
    pod = "pvcpod"
}

After the object generates text with a similar structure, it can expose metrics endpoints, which can be automatically collected by Prometheus, or pushed to Prometheus through Pushgateway.

Next, we will build a complete Prometheus monitoring system in Kubernetes.

Kubernetes documentation:
https://v1-20.docs.kubernetes.io/docs/tasks/debug-application-cluster/

practice

▌Node monitoring

The node exporter is written in Golang and is used on Linux systems to collect all hardware and OS-level metrics exposed by the kernel, including CPU, information, network card traffic, system load, sockets, machine configuration, etc.

Readers can refer to all the indicators listed in https://github.com/prometheus/node_exporter that are enabled or disabled by default.

Since each node in the cluster needs to be monitored, it is necessary to run this node exporter instance on each node, and automatically schedule a node exporter to run on this node when a new node is added to the cluster, so it is necessary to use node exporter The deployment requires DaemontSet mode.

View all nodes in the cluster:

 root@master:~# kubectl get nodes
NAME     STATUS                     ROLES                  AGE     VERSION
master   Ready,SchedulingDisabled   control-plane,master   98d     v1.22.2
salve2   Ready                      <none>                 3h50m   v1.23.3
slave1   Ready                      <none>                 98d

Bibin Wilson has packaged the YAML file of the node exporter for Kubernetes, we can download it directly:

 git clone https://github.com/bibinwilson/kubernetes-node-exporter

Open the daemonset.yaml file in the repository to get an overview of the information in it.

In the YAML file, you can see that the node exporter will be deployed and run in the namespace monitoring, which has two labels:

 labels:
     app.kubernetes.io/component: exporter
     app.kubernetes.io/name: node-exporter

In order for the node exporter to be scheduled to run on the master node, we need to add a tolerance attribute to the Pod:

 template:
    metadata:
      labels:
        app.kubernetes.io/component: exporter
        app.kubernetes.io/name: node-exporter
    spec:
    # 复制下面这部分到对应的位置
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      - key: "node.kubernetes.io/unschedulable"
        operator: "Exists"
        effect: "NoSchedule"

To deploy the node exporter, we first create the namespace:

 kubectl create namespace monitoring

Execute the command to deploy node exporter:

 kubectl create -f daemonset.yaml

View node exporter instance

 root@master:~# kubectl get daemonset -n monitoring
NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
node-exporter   3         3         3       3            3

Since the node exporter Pods are scattered on each node, in order to facilitate Prometheus to collect the Pod IPs of these node exporters, it is necessary to create a unified collection of Endpoints. Here, Endpoints are automatically generated by creating a Service to achieve the purpose.

Look at the service.yaml file under the repository, which is defined as follows:

 kind: Service
apiVersion: v1
metadata:
  name: node-exporter
  namespace: monitoring
  annotations:
      prometheus.io/scrape: 'true'
      prometheus.io/port:   '9100'
spec:
  selector:
      app.kubernetes.io/component: exporter
      app.kubernetes.io/name: node-exporter
  ports:
  - name: node-exporter
    protocol: TCP
    port: 9100
    targetPort: 9100

The selector for this Service is as follows:

 selector:
   app.kubernetes.io/component: exporter
   app.kubernetes.io/name: node-exporter

Create Service:

 kubectl create -f service.yaml

View the Pod IP of the node exporter collected by Endpoint:

 root@master:~# kubectl get endpoints -n monitoring 
NAME                    ENDPOINTS                                       AGE
node-exporter           10.32.0.27:9100,10.36.0.4:9100,10.44.0.3:9100

The node exporter will not do much more than collect various metrics data.

References for this chapter:
https://devopscube.com/node-exporter-kubernetes/

▌Deploy Prometheus

Now with node exporter, various metrics of nodes can be collected, and the next step is to collect metrics data for Kubernetes infrastructure.

Kubernetes itself provides a lot of metrics data, there are three endpoints /metrics/cadvisor, /metrics/resource and /metrics/probes .

Take /metrics/cadvisor as an example, cAdvisor analyzes the metrics of memory, CPU, file and network usage of all containers running on a given node, you can refer to the link below to learn about all the metrics of cAdvisor.
https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md

Other information:

Source location:
https://github.com/kubernetes/metrics/blob/master/pkg/apis/metrics/v1beta1/types.go
Kubernetes monitoring architecture design:
https://github.com/kubernetes/design-proposals-archive

In this section, the deployed Prometheus will perform the following actions on kubenetes to collect metrics data:

Kubernetes-apiservers: get all metrics from API server;
Kubernetes node: it collects all kubernetes node metrics;
kubernetes-pods: add prometheus.io/scrape and prometheus.io/port annotations to pod metadata, all pod metrics will be discovered;
kubernetes-cadvisor: collects all cAdvisor metrics, related to containers;
Kubernetes-Service-endpoints: If the service metadata is annotated with prometheus.io/scrape and prometheus.io/port, then all service endpoints will be deprecated;

Big Brother Bibin Wilson has packaged the relevant deployment definition files, we can download them directly:

 git clone https://github.com/bibinwilson/kubernetes-prometheus

Prometheus uses the Kubernetes API Server to obtain all available metrics such as nodes, Pods, and Deployments. Therefore, we need to create an RBAC policy with read-only access to the desired API group and bind the policy to the monitoring namespace to restrict the Prometheus Pod to read-only access to the API.

View the clusterRole.yaml file for a list of resource objects to monitor:

 - apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses

Create roles and role bindings in the cluster:

kubectl create -f clusterRole.yaml

Prometheus can be configured via command line flags and configuration files. While command line flags configure immutable system parameters (such as storage location, amount of data to keep on disk and memory, etc.), configuration files define everything related to scraping jobs and their instances, and which rules are loaded file, so deploying Permetheus requires file configuration.

The configuration file of Permetheus is written in YAML format. For specific rules, please refer to the link below.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/

In order to easily map the configuration file to the Permetheus Pod, we need to put the configuration in the configmap, and then mount it to the Pod. The configuration content can be viewed in config-map.yaml. config-map.yaml defines many rules for collecting data sources, such as collecting Kubernetes clusters and node exporters. For configuration, please refer to:

 scrape_configs:
      - job_name: 'node-exporter'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_endpoints_name]
          regex: 'node-exporter'
          action: keep

You can preview this file online by opening the link below.
https://raw.githubusercontent.com/bibinwilson/kubernetes-prometheus/master/config-map.yaml
Create a configmap:
kubectl create -f config-map.yaml

This configuration is very important and needs to be configured according to the actual situation, which is generally handled by operation and maintenance, and will not be discussed here.

Next, Prometheus will be deployed. Since the sample file uses the emtpy volume to store Prometheus data, once the Pod is restarted, the data will be lost, so it can be changed to the hostpath volume here.

Open the prometheus-deployment.yaml file:

Will

 emptyDir: {}

change to

 hostPath:
        path: /data/prometheus
        type: Directory

Can be changed or not.

If you change it, you need to create the /data/prometheus directory on the node corresponding to the scheduled Pod.

Deploy Prometeus:

 kubectl create  -f prometheus-deployment.yaml

Check deployment status:

 root@master:~# kubectl get deployments --namespace=monitoring
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
prometheus-deployment   1/1     1            1           23h

In order to access Prometeus from the outside world, you need to create a Service:

 apiVersion: v1
kind: Service
metadata:
  name: prometheus-service
  namespace: monitoring
  annotations:
      prometheus.io/scrape: 'true'
      prometheus.io/port:   '9090'

spec:
  selector: 
    app: prometheus-server
  type: NodePort  
  ports:
    - port: 8080
      targetPort: 9090 
      nodePort: 30000

 kubectl create -f prometheus-service.yaml

Next you can access the Prometeus UI panel.

Click Graph, click the 🌏 icon, select the indicator value to be displayed, and then click Execute to query the display.

You can also view metrics data sources collected by Prometheus in Service Discovery.

If your cluster does not have kube-state-metrics installed, then this data source will be marked in red. In the next section, we continue to deploy this component.

At this point, our monitoring structure looks like this:

References in this section:
https://devopscube.com/setup-prometheus-monitoring-on-kubernetes/

▌Deploy Kube State Metrics

Kube State metrics is a service that communicates with Kubernetes API Server to get details of all API objects like Deployment, Pod, etc.

Kube State metrics provides metrics for Kubernetes objects and resources that cannot be obtained directly from local Kubernetes monitoring components, because the metrics provided by Kubenetes Metrics itself are not very comprehensive, so Kube State Metrics is required to obtain all metrics related to kubernetes objects.

Here are some important metrics that can be obtained from Kube State metrics:

Node status, node capacity (CPU and memory)
Replica-set compliance (desired/available/unavailable/updated status of replicas per deployment)
Pod status (waiting, running, ready, etc)
Ingress metrics
PV, PVC metrics
Daemonset & Statefulset metrics.
Resource requests and limits.
Job & Cronjob metrics

Detailed metrics supported can be viewed in the documentation here.

Detailed indicators:
https://github.com/kubernetes/kube-state-metrics/tree/master/docs

Big Brother Bibin Wilson has packaged the relevant deployment definition files, we can download them directly:

 git clone https://github.com/devopscube/kube-state-metrics-configs.git

Directly apply all the YAML to create the corresponding resource:

 kubectl apply -f kube-state-metrics-configs/

 ├── cluster-role-binding.yaml
├── cluster-role.yaml
├── deployment.yaml
├── service-account.yaml
└── service.yaml

The resources created above include the following parts, which will not be explained in this section.

Service Account
Cluster Role
Cluster Role Binding
Kube State Metrics Deployment
Service

Check the deployment status with the following command:

 kubectl get deployments kube-state-metrics -n kube-system

Then, refresh the Prometheus Service Discovery, you can see that the red has changed to blue, click on this data source, you can see the following information:

 - job_name: 'kube-state-metrics'
  static_configs:
    - targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']

This configuration is the access address of kube-state-metrics.
Here, our deployed Prometeus structure is as follows:

References for this section:
https://devopscube.com/setup-kube-state-metrics/

▌Deploy Grafana

After the deployment in the previous sections, the collection of data sources and data storage have been completed. Next, we will deploy Grafana and use Grafana to analyze and visualize indicator data.

Big Brother Bibin Wilson has packaged the relevant deployment definition files, we can download them directly:

 git clone https://github.com/bibinwilson/kubernetes-grafana.git

First look at the grafana-datasource-config.yaml file, this configuration is for Grafana to automatically configure the Prometheus data source.

There is also a very important address in it:

 "url": "http://prometheus-service.monitoring.svc:8080",

Here to confirm whether your CoreDNS is normal, you can refer to the DNS debugging method listed in the link to confirm whether the Pod can be accessed through DNS in your cluster.

https://kubernetes.io/en/docs/tasks/administer-cluster/dns-debugging-resolution/

The easiest way is to start a Pod, and then use the command to test curl http://prometheus-deployment.monitoring.svc:8080 to see if you can get the response data, if it appears:

 root@master:~/jk/kubernetes-prometheus# curl http://prometheus-deployment.monitoring.svc:8080
curl: (6) Could not resolve host: prometheus-deployment.monitoring.svc
root@master:~/jk/kubernetes-prometheus# curl http://prometheus-deployment.monitoring.svc.cluster.local:8080
curl: (6) Could not resolve host: prometheus-deployment.monitoring.svc.cluster.local

It may be that your coredns is not installed or other reasons, so that you cannot access Prometheus through this address. In order to avoid excessive operations, you can use IP instead of domain name.

View Prometheus's Service IP:

 root@master:~/jk/kubernetes-prometheus# kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
prometheus-deployment   NodePort    10.105.95.8     <none>        9090:32330/TCP   23h

Test whether the access through the Service IP is normal

 root@master:~/jk/kubernetes-prometheus# curl 10.105.95.8:9090
<a href="/graph">Found</a>.

Change prometheus-deployment.monitoring.svc.cluster.local:8080 in grafana-datasource-config.yaml to the corresponding Service IP, and change the port to 9090.

Create configuration

 kubectl create -f grafana-datasource-config.yaml

Open deployment.yaml to view the definition. The data storage of grafana in the template also uses empty volume, which has the risk of data loss, so it can be changed to use hospath or other types of volume storage. You can refer to the author's configuration:

 volumes:
        - name: grafana-storage
          hostPath:
            path: /data/grafana
            type: Directory

Deploy Grafana:

kubectl create -f deployment.yaml
Then create the Service:

kubectl create -f service.yaml
Grafana can then be accessed through port 32000.

The account password is admin

So far, the Prometheus monitoring structure we have deployed is as follows:

When we first entered, it was empty. We needed to use the chart template to create a visual interface in order to display beautiful data.

In the official Grafana website, there are many free templates made by the community.

free template
https://grafana.com/grafana/dashboards/?search=kubernetes

First open https://grafana.com/grafana/dashboards/8588 to download this template, then upload the template file and bind the corresponding Prometheus data source.

Next, you can see the corresponding monitoring interface.

You can open Browse, continue importing more templates, and see which template monitoring interface to display.

References for this section:
https://devopscube.com/setup-grafana-kubernetes/

▌How to connect to Prometheus and Grafana

The monitoring of the infrastructure has been mentioned above. We can also generate and collect indicator data for middleware such as TIDB and Mysql. We can also customize the indicator data in the program, and then make our own Grafana template. If you are .NET development, you can also refer to another article of the author for a step-by-step understanding of these processes.

another article
https://www.cnblogs.com/whuanle/p/14969982.html

▌Alarm

In the monitoring system, alarms are the top priority. Generally, it is necessary to develop alarm processing and push notification components according to the actual situation of the company.
We recommend that you read my alert philosophy based on Rob Ewaschuk's observations at Google.

https://docs.google.com/a/boxever.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit
When Prometheus was deployed earlier, an alert rule was already defined in config-map.yaml.

 prometheus.rules: |-
    groups:
    - name: devopscube demo alert
      rules:
      - alert: High Pod Memory
        expr: sum(container_memory_usage_bytes) > 1
        for: 1m
        labels:
          severity: slack
        annotations:
          summary: High Memory Usage

An alarm rule mainly consists of the following parts:

alert: The name of the alert rule.
expr: An alarm trigger condition based on a PromQL expression, used to calculate whether a time series meets the condition.
for: evaluation wait time, optional parameter. Used to indicate that an alert is only sent when the triggering condition persists for a period of time. The status of newly generated alarms during the waiting period is pending.
labels: Custom labels that allow the user to specify a set of additional labels to attach to the alert.
annotations: used to specify a set of additional information, such as text used to describe the detailed information of the alarm, etc. The content of annotations will be sent to Alertmanager as a parameter when the alarm is generated.
You can refer to:
https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/prometheus-alert-rule

This rule can also be seen in Grafana.

Next we will configure alert notifications in the future.

First, create an alarm contact method. The author uses DingTalk Webhook.

Then find Alert Rules and add a new alert rule.

For alarm rules, please refer to:
https://grafana.com/docs/grafana/latest/alerting/unified-alerting/alerting-rules/create-grafana-managed-rule/

Then open Notification policies, bind the alarm rules and contact information, and the qualified alarm information will be pushed to the specified contact information.

You can see the push record of alert information in Alert Rules. Since the author's server is abroad, the server may not be able to use DingTalk's Webhook function, so it has been Pending here, so the author will not try too much here, the reader can understand the general steps.

Here is the rendering using the Telegram bot WebHook:

Microsoft Most Valuable Professional (MVP)

The Microsoft Most Valuable Professional is a global award given to third-party technology professionals by Microsoft Corporation. For 29 years, technology community leaders around the world have received this award for sharing their expertise and experience in technology communities both online and offline.

MVPs are a carefully selected team of experts who represent the most skilled and intelligent minds, passionate and helpful experts who are deeply invested in the community. MVP is committed to helping others and maximizing the use of Microsoft technologies by Microsoft technical community users by speaking, forum Q&A, creating websites, writing blogs, sharing videos, open source projects, organizing conferences, etc.

For more details, please visit the official website:
https://mvp.microsoft.com/en-us

Long press to identify the QR code and follow Microsoft China MSDN

Design and practice of Kubernetes cluster and application monitoring solution

Kubernetes monitoring

▌Monitoring objects

▌Prometheus

▌Indicators

practice

▌Node monitoring

▌Deploy Prometheus

▌Deploy Kube State Metrics

▌Deploy Grafana

▌How to connect to Prometheus and Grafana

▌Alarm

微软技术栈

引用和评论

从热爱出发，用 Agent 构建未来：双料得主的突围之路

使用 Office Tool Plus 安装并激活 Microsoft 365（原 Office 365）

JManus - 面向 Java 开发者的开源通用智能体

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

深度测评国产 AI 程序员，在 QwQ 和满血版 DeepSeek 助力下，哪些能力让你眼前一亮？

分析型数据库入门指南：如何选择适合你的实时分析工具？

安利一个求职刷题小妙招、变身 offer 收割机 | 《趣玩》第 2 期