7

视频教程

监控需求

我们使用k8s后一般需要监控一下大块指标

指标类型采集源应用举例发现类型grafana截图
容器基础资源指标kubelet 内置cadvisor metrics接口查看容器cpu、mem利用率等k8s_sd node级别直接访问node_ipimage
k8s资源指标kube-stats-metrics (简称ksm)具体可以看从容器监控kube-stats-metrics看k8s众多组件
看pod状态如pod waiting状态的原因
数个数如:查看node pod按namespace分布情况
通过coredns访问域名image
k8s服务组件指标服务组件 metrics接口查看apiserver 、scheduler、etc、coredns请求延迟等k8s_sd endpoint级别image
部署在pod中业务埋点指标pod 的metrics接口依据业务指标场景k8s_sd pod级别,访问pod ip的metricspath

详细的说明

教程地址

prometheus中的实现方式

鉴权/证书问题

我们在prometheus采集job中经常能看到下面的 token 证书配置,主要原因为

  • token用来做鉴权来访问metrics接口
  • apiserver可以采用tls双向认证,所以需要提供证书

    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true

    prometheus通过 sa,clusterrolebinding来解决token、证书挂载问题

    sa等配置: prometheus yaml中需要配置对应的saserviceAccountName
    apiVersion: rbac.authorization.k8s.io/v1 # api的version
    kind: ClusterRole # 类型
    metadata:
    name: prometheus
  • apiGroups: [""]
    resources: # 资源

    • nodes
    • nodes/proxy
    • services
    • endpoints
    • pods
      verbs: ["get", "list", "watch"]
  • apiGroups:

    • extensions
      resources:
    • ingresses
      verbs: ["get", "list", "watch"]
  • nonResourceURLs: ["/metrics"]

    verbs: ["get"]

    apiVersion: v1
    kind: ServiceAccount
    metadata:
    name: prometheus # 自定义名字

    namespace: kube-system # 命名空间

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
    name: prometheus
    roleRef: # 选择需要绑定的Role
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: cluster-admin
    subjects: # 对象

  • kind: ServiceAccount
    name: prometheus
    namespace: kube-system

    > 配置好之后 k8s会将对应文件挂载到pod中

    / # ls /var/run/secrets/kubernetes.io/serviceaccount/ -l
    total 0
    lrwxrwxrwx 1 root root 13 Jan 7 20:54 ca.crt -> ..data/ca.crt
    lrwxrwxrwx 1 root root 16 Jan 7 20:54 namespace -> ..data/namespace
    lrwxrwxrwx 1 root root 12 Jan 7 20:54 token -> ..data/token
    / # df -h |grep service
    tmpfs 7.8G 12.0K 7.8G 0% /var/run/secrets/kubernetes.io/serviceaccount
    / #

  • 配置TOKEN

    TOKEN=$(kubectl -n kube-system  get secret $(kubectl -n kube-system  get serviceaccount prometheus -o jsonpath='{.secrets[0].name}') -o jsonpath='{.data.token}' | base64 --decode )
  • 访问对应接口,如apiserver

     curl  https://localhost:10259/metrics --header "Authorization: Bearer $TOKEN" --insecure     |head
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
        0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP apiserver_audit_event_total [ALPHA] Counter of audit events generated and sent to the audit backend.
      # TYPE apiserver_audit_event_total counter
      apiserver_audit_event_total 0
      # HELP apiserver_audit_requests_rejected_total [ALPHA] Counter of apiserver requests rejected due to an error in audit logging backend.
      # TYPE apiserver_audit_requests_rejected_total counter
      apiserver_audit_requests_rejected_total 0
      # HELP apiserver_client_certificate_expiration_seconds [ALPHA] Distribution of the remaining lifetime on the certificate used to authenticate a request.
      # TYPE apiserver_client_certificate_expiration_seconds histogram
      apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
      apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
      100 36590    0 36590    0     0   194k      0 --:--:-- --:--:-- --:--:--  195k
    

下面来看下采集方式

容器基础指标
prometheus 采集配置
- job_name: kubernetes-nodes-cadvisor
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
 
  kubernetes_sd_configs:
  - role: node
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  relabel_configs:
  - separator: ;
    regex: __meta_kubernetes_node_label_(.+)
    replacement: $1
    action: labelmap
  - separator: ;
    regex: (.*)
    target_label: __metrics_path__
    replacement: /metrics/cadvisor
    action: replace
下面来解读下
  • 代表采用k8s服务发现 node

      kubernetes_sd_configs:
      - role: node
  • 表示服务发现后的node中标签以_meta_kubernetes_node_label_开头的key,替换为后面的字符串,举例__meta_kubernetes_node_label_kubernetes_io_arch="amd64" 这组keyv将被替换为 kubernetes_io_arch="amd64"

      relabel_configs:
       - separator: ;
         regex: __meta_kubernetes_node_label_(.+)
         replacement: $1
         action: labelmap
  • 表示采集url /metrics 被替换为/metrics/cadvisor

      - separator: ;
     regex: (.*)
     target_label: __metrics_path__
     replacement: /metrics/cadvisor
     action: replace
k8s资源指标
prometheus 采集配置
- job_name: kube-state-metrics
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - kube-state-metrics:8080
下面来解读下
  • 接口是http的而且没有鉴权,所以无需配置token和cert
  • target这里配置的是 kube-state-metrics:8080
  - targets:
    - kube-state-metrics:8080
  • 因为kube-state-metrics部署好之后有个service
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: v1.9.7
  name: kube-state-metrics
  namespace: kube-system
spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
  - name: telemetry
    port: 8081
    targetPort: telemetry
  selector:
    app.kubernetes.io/name: kube-state-metrics
  • k8s 会为service创建cordns解析,解析域名为 ${service_name}.${namespace}.svc.cluster.local ,ksm的域名fqdn为 kube-state-metrics.kube-system.svc.cluster.local
  • pod中的dns配置为search 3个域,所以配置成kube-state-metrics-kube-system:8080是可以的,当然也可以配置成kube-state-metrics.kube-system.svc:8080 kube-state-metrics.kube-system.svc.cluster.local:8080
/ # cat /etc/resolv.conf 
nameserver 10.96.0.10
search kube-admin.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
k8s服务组件指标
apiserver prometheus 采集配置其中 etcd集成在apiserver中了
kube-controler coredns kube-scheduler等同理
- job_name: kubernetes-apiservers
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: https
  kubernetes_sd_configs:
  - role: endpoints
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: false
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: default;kubernetes;https
    replacement: $1
    action: keep
下面来解读下
  • endpoint资源是暴露一个服务的ip地址和port的列表
  • 代表采用k8s服务发现 endpoint,endpoint会非常多,所以需要过滤apiserver的
kubernetes_sd_configs:
- role: endpoints
  • 过滤手段为 标签 __meta_kubernetes_namespace匹配default并且 __meta_kubernetes_service_name 匹配kubernetes 并且 __meta_kubernetes_endpoint_port_name 匹配https,咋样呢 : keep
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
regex: default;kubernetes;https
replacement: $1
action: keep
  • k8s 会在default namespace中创建apiserver的 service

    $ kubectl get svc -A |grep  443
    default         kubernetes                                     ClusterIP   10.96.0.1       <none>        443/TCP                  9d
  • 最后获取到的endpoint转换为采集路径为: https://masterip:6443/metrics
pod业务埋点指标
采集配置
- job_name: kubernetes-pods
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: $1
    action: replace
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    separator: ;
    regex: ([^:]+)(?::d+)?;(d+)
    target_label: __address__
    replacement: $1:$2
    action: replace
  - separator: ;
    regex: __meta_kubernetes_pod_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: kubernetes_namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_pod_name
    replacement: $1
    action: replace
下面来解读下
  • pod的yaml需要在spec.template.metadata.annotations中配置 是否采集,端口,path信息
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9102'
        prometheus.io/path: 'metrics'
  • 同时在采集配置中也做了label替换和过滤即
  • 过滤__meta_kubernetes_pod_annotation_prometheus_io_scrape = true的
  • 将 __meta_kubernetes_pod_annotation_prometheus_io_path这个key替换为 __metrics_path__(prometheus默认采集路径key)
    -__address__,__meta_kubernetes_pod_annotation_prometheus_io_port 替换为__address__
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: $1
    action: replace
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    separator: ;
    regex: ([^:]+)(?::d+)?;(d+)
    target_label: __address__
    replacement: $1:$2
    action: replace

指标汇总展示

cadvisor指标说明

cpu指标

夜莺指标名含义prometheus metrics或计算方式说明
cpu.util容器cpu使用占其申请的百分比sum (rate (container_cpu_usage_seconds_total[1m])) by( container) /( sum (container_spec_cpu_quota) by(container) /100000) * 1000-100的范围
cpu.idle容器cpu空闲占其申请的百分比100 - cpu.util0-100的范围
cpu.user容器cpu用户态使用占其申请的百分比sum (rate (container_cpu_user_seconds_total[1m])) by( container) /( sum (container_spec_cpu_quota) by(container) /100000) * 1000-100的范围
cpu.sys容器cpu内核态使用占其申请的百分比sum (rate (container_cpu_sys_seconds_total[1m])) by( container) /( sum (container_spec_cpu_quota) by(container) /100000) * 1000-100的范围
cpu.cores.occupy容器cpu使用占用机器几个核rate(container_cpu_usage_seconds_total[1m])0到机器核数上限,结果为1就是占用1个核
cpu.spec.quota容器的CPU配额container_spec_cpu_quota为容器指定的CPU个数*100000
cpu.throttled.util容器CPU执行周期受到限制的百分比sum by(container_name, pod_name, namespace) (increase(container_cpu_cfs_throttled_periods_total{container_name!=""}[5m])) /
sum by(container_name, pod_name, namespace) (increase(container_cpu_cfs_periods_total[5m])) * 100
0-100的范围
cpu.periods容器生命周期中度过的cpu周期总数counter型无需计算使用rate/increase 查看
cpu.throttled.periods容器生命周期中度过的受限的cpu周期总数counter型无需计算使用rate/increase 查看
cpu.throttled.time容器被节流的总时间 )counter型无需计算单位(纳秒

mem指标

夜莺指标名含义prometheus metrics或计算方式说明
mem.bytes.total容器的内存限制无需计算单位byte 对应pod yaml中resources.limits.memory
mem.bytes.used当前内存使用情况,包括所有内存,无论何时访问container_memory_rss + container_memory_cache + kernel memory单位byte
mem.bytes.used.percent容器内存使用率container_memory_usage_bytes/container_spec_memory_limit_bytes *100范围0-100
mem.bytes.workingset容器真实使用的内存量,也是limit限制时的 oom 判断依据container_memory_max_usage_bytes > container_memory_usage_bytes >= container_memory_working_set_bytes > container_memory_rss单位byte
mem.bytes.workingset.percent容器真实使用的内存量百分比container_memory_working_set_bytes/container_spec_memory_limit_bytes *100范围0-100
mem.bytes.cached容器cache内存量container_memory_cache单位byte
mem.bytes.rss容器rss内存量container_memory_rss单位byte
mem.bytes.swap容器cache内存量container_memory_swap单位byte

filesystem && disk.io指标

夜莺指标名含义prometheus metrics或计算方式说明
disk.bytes.total容器可以使用的文件系统总量container_fs_limit_bytes(单位:字节)
disk.bytes.used容器已经使用的文件系统总量container_fs_usage_bytes(单位:字节)
disk.bytes.used.percent容器文件系统使用百分比container_fs_usage_bytes/container_fs_limit_bytes *100范围0-100
disk.io.read.bytes容器io.read qpsrate(container_fs_reads_bytes_total)[1m](单位:bps)
disk.io.write.bytes容器io.write qpsrate(container_fs_write_bytes_total)[1m](单位:bps)

network指标

夜莺指标名含义prometheus metrics或计算方式说明
net.in.bytes容器网络接收数据总数rate(container_network_receive_bytes_total)[1m](单位:bytes/s)
net.out.bytes容器网络积传输数据总数)rate(container_network_transmit_bytes_total)[1m](单位:bytes/s)
net.in.pps容器网络接收数据包ppsrate(container_network_receive_packets_total)[1m](单位:p/s)
net.out.pps容器网络发送数据包ppsrate(container_network_transmit_packets_total)[1m](单位:p/s)
net.in.errs容器网络接收数据错误数rate(container_network_receive_errors_total)[1m](单位:bytes/s)
net.out.errs容器网络发送数据错误数rate(container_network_transmit_errors_total)[1m](单位:bytes/s)
net.in.dropped容器网络接收数据包drop ppsrate(container_network_receive_packets_dropped_total)[1m](单位:p/s)
net.out.dropped容器网络发送数据包drop ppsrate(container_network_transmit_packets_dropped_total)[1m](单位:p/s)
container_network_{tcp,udp}_usage_total 默认不采集是因为 --disable_metrics=tcp, udp ,因为开启cpu压力大

system指标

夜莺指标名含义prometheus metrics或计算方式说明
sys.ps.process.count容器中running进程个数container_processes(单位:个)
sys.ps.thread.count容器中进程running线程个数container_threads(单位:个)
sys.fd.count.used容器中打开文件描述符个数container_file_descriptors(单位:个)
sys.fd.soft.ulimits容器中root process Soft ulimitcontainer_ulimits_soft(单位:个)
sys.socket.count.used容器中打开套接字个数container_sockets(单位:个)
sys.task.state容器中task 状态分布container_tasks_state(单位:个)

kube-apiserver metrics

指标名类型含义说明
apiserver_request_totalcounter请求总数
sum_by使用
按状态码code分布 2xx 3xx 4xx 5xx 等
按动作verb分布 list get watch post delete等
按资源resource分布: pod node endpoint等
apiserver_request_duration_seconds_sumgauge请求延迟记录和按动作verb分布 list get watch post delete等
按资源resource分布: pod node endpoint等
apiserver_request_duration_seconds_countgauge请求延迟记录数计算平均延迟: apiserver_request_duration_seconds_sum/apiserver_request_duration_seconds_count
apiserver_response_sizes_sumcounter请求响应大小记录和
apiserver_response_sizes_countcounter请求响应大小记录数
authentication_attemptscounter认证尝试数
authentication_duration_seconds_sumcounter认证耗时记录和
authentication_duration_seconds_countcounter认证耗时记录数
apiserver_tls_handshake_errors_totalcountertls握手失败计数
apiserver_client_certificate_expiration_seconds_sumgauge证书过期时间总数
apiserver_client_certificate_expiration_seconds_countgauge证书过期时间记录个数
apiserver_client_certificate_expiration_seconds_bucketgauge证书过期时间分布
apiserver_current_inflight_requestsgauge该量保存了最后一个窗口中,正在处理的请求数量的高水位线
apiserver_current_inqueue_requestsgauge是一个表向量, 记录最近排队请求数量的高水位线apiserver请求限流
apiserver_flowcontrol_current_executing_requestsgauge记录包含执行中(不在队列中等待)请求的瞬时数量APF api的QOS APIPriorityAndFairness
apiserver_flowcontrol_current_inqueue_requestsgauge记录包含排队中的(未执行)请求的瞬时数量
workqueue_adds_totalcounterwq 入队数
workqueue_retries_totalcounterwq retry数
workqueue_longest_running_processor_secondsgaugewq中最长运行时间
workqueue_queue_duration_seconds_sumgaugewq中等待延迟记录和
workqueue_queue_duration_seconds_countgaugewq中等待延迟记录数
workqueue_work_duration_seconds_sumgaugewq中处理延迟记录和
workqueue_work_duration_seconds_countgaugewq中处理延迟记录数

etcd metrics

指标名类型含义说明
etcd_db_total_size_in_bytesgaugedb物理文件大小
etcd_object_countsgaugeetcd对象按种类计数
etcd_request_duration_seconds_sumgaugeetcd请求延迟记录和
etcd_request_duration_seconds_countgaugeetcd请求延迟记录数

kube-scheduler

指标名类型含义说明
scheduler_e2e_scheduling_duration_seconds_sumgauge端到端调度延迟记录和
scheduler_e2e_scheduling_duration_seconds_countgauge端到端调度延迟记录数
scheduler_pod_scheduling_duration_seconds_sumgauge调度延迟记录和分析次数
scheduler_pod_scheduling_duration_seconds_countgauge调度延迟记录数
scheduler_pending_podsgauge调度队列pending pod数
scheduler_queue_incoming_pods_totalcounter进入调度队列pod数
scheduler_scheduling_algorithm_duration_seconds_sumgauge调度算法延迟记录和
scheduler_scheduling_algorithm_duration_seconds_countgauge调度算法延迟记录数
scheduler_pod_scheduling_attempts_sumgauge成功调度一个pod 的尝试次数记录和
scheduler_pod_scheduling_attempts_countgauge成功调度一个pod 的尝试次数记录数

coredns

指标名类型含义说明
coredns_dns_requests_totalcounter解析请求数A记录
AAAA记录
other记录
coredns_dns_responses_totalcounter解析响应数NOERROR
NXDOMAIN
REFUSED
coredns_cache_entriesgauge缓存记录数成功或失败
coredns_cache_hits_totalcounter缓存命中数成功或失败
coredns_cache_misses_totalcounter缓存未命中数成功或失败
coredns_dns_request_duration_seconds_sumgauge解析延迟记录和
coredns_dns_request_duration_seconds_countgauge解析延迟记录数
coredns_dns_response_size_bytes_sumgauge解析响应大小记录和
coredns_dns_response_size_bytes_countgauge解析响应大小记录数

kube-scheduler

指标名类型含义说明
scheduler_e2e_scheduling_duration_seconds_sumgauge端到端调度延迟记录和
scheduler_e2e_scheduling_duration_seconds_countgauge端到端调度延迟记录数
scheduler_pod_scheduling_duration_seconds_sumgauge调度延迟记录和分析次数
scheduler_pod_scheduling_duration_seconds_countgauge调度延迟记录数
scheduler_pending_podsgauge调度队列pending pod数
scheduler_queue_incoming_pods_totalcounter进入调度队列pod数
scheduler_scheduling_algorithm_duration_seconds_sumgauge调度算法延迟记录和
scheduler_scheduling_algorithm_duration_seconds_countgauge调度算法延迟记录数
scheduler_pod_scheduling_attempts_sumgauge成功调度一个pod 的尝试次数记录和
scheduler_pod_scheduling_attempts_countgauge成功调度一个pod 的尝试次数记录数

kube-stats-metrics

pod metrics

指标名类型含义
kube_pod_status_phasegaugepod状态统计:
Pending
Succeeded
Failed
Running
Unknown
kube_pod_container_status_waitingcounterpod处于waiting状态,值为1代表waiting
kube_pod_container_status_waiting_reasongaugepod处于waiting状态原因
ContainerCreating
CrashLoopBackOff pod启动崩溃,再次启动然后再次崩溃
CreateContainerConfigError
ErrImagePull
ImagePullBackOff
CreateContainerError
InvalidImageName
kube_pod_container_status_terminatedgaugepod处于terminated状态,值为1代表terminated
kube_pod_container_status_terminated_reasongaugepod处于terminated状态原因
OOMKilled
Completed
Error
ContainerCannotRun
DeadlineExceeded
Evicted
kube_pod_container_status_restarts_totalcounterpod中的容器重启次数
kube_pod_container_resource_requests_cpu_coresgaugepod容器cpu limit
kube_pod_container_resource_requests_memory_bytesgaugepod容器mem limit(单位:字节)

deployment metrics

指标名类型含义
kube_deployment_status_replicasgaugedep中的pod num
kube_deployment_status_replicas_availablegaugedep中的 可用pod num
kube_deployment_status_replicas_unavailablegaugedep中的 不可用pod num

daemonSet metrics

指标名类型含义
kube_daemonset_status_number_availablegaugeds 可用数
kube_daemonset_status_number_unavailablegaugeds 不可用数
kube_daemonset_status_number_readygaugeds ready数
kube_daemonset_status_number_misscheduledgauge未经过调度运行ds的节点数
kube_daemonset_status_current_number_scheduledgaugeds目前运行节点数
kube_daemonset_status_desired_number_scheduledgauge应该运行ds的节点数

daemonSet metrics

指标名类型含义
kube_statefulset_status_replicasgaugess副本总数
kube_statefulset_status_replicas_currentgaugess当前副本数
kube_statefulset_status_replicas_updatedgaugess已更新副本数
kube_statefulset_replicasgaugess目标副本数

Job metrics

指标名类型含义
kube_job_status_activegaugejob running pod数
kube_job_status_succeededgaugejob 成功 pod数
kube_job_status_failedgaugejob 失败 pod数
kube_job_completegaugejob 是否完成
kube_job_failedgaugejob 是否失败

CronJob metrics

指标名类型含义
kube_cronjob_status_activegaugejob running pod数
kube_cronjob_spec_suspendgauge=1代表 job 被挂起
kube_cronjob_next_schedule_timegaugejob 下次调度时间
kube_cronjob_status_last_schedule_timegaugejob 下次调度时间

PersistentVolume metrics

指标名类型含义
kube_persistentvolume_capacity_bytesgaugepv申请大小
kube_persistentvolume_status_phasegaugepv状态:
Pending
Available
Bound
Released
Failed

PersistentVolumeClaim metrics

指标名类型含义
kube_persistentvolumeclaim_resource_requests_storage_bytesgaugepvc request大小
kube_persistentvolumeclaim_status_phasegaugepvc状态:
Lost
Bound
Pending

node metrics

指标名类型含义
kube_node_status_conditiongaugecondition:
NetworkUnavailable
MemoryPressure
DiskPressure
PIDPressure
Ready
kube_node_status_allocatable_cpu_coresgauge节点可以分配cpu核数
kube_node_status_allocatable_memory_bytesgauge节点可以分配内存总量(单位:字节)
kube_node_spec_taintgauge节点污点情况
kube_node_status_capacity_memory_bytesgauge节点内存总量(单位:字节)
kube_node_status_capacity_cpu_coresgauge节点cpu核数
kube_node_status_capacity_podsgauge节点可运行的pod总数

ning1875
167 声望67 粉丝

k8s/prometheus/cicd运维开发专家,想进阶的dy搜 小乙运维杂货铺