从容器监控kube-stats-metrics看k8s众多组件

k8s监控组织架构

k8s_monitor.jpg

指标说明

  • 系统指标
    分为节点/容器资源使用和DaemonSet运行的资源
  • 服务指标
    分为Kubernetes基础结构组件产生的和应用pod产生的

kube-stats-metrics

- job_name: kube-state-metrics
  honor_timestamps: false
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - kube-state-metrics.kube-admin:8080

k8s apiserver是什么

k8s API Server提供了k8s各类资源对象(pod,RC,Service等)的增删改查及watch等HTTP Rest接口,是整个系统的数据总线和数据中心

采集原理

kube-state-metrics使用client-go与Kubernetes集群通信,不断轮询api-server

  • 初始化metric store family
// E:\go_path\src\k8s.io\kube-state-metrics\internal\store\builder.go
var availableStores = map[string]func(f *Builder) cache.Store{
    "certificatesigningrequests":      func(b *Builder) cache.Store { return b.buildCsrStore() },
    "configmaps":                      func(b *Builder) cache.Store { return b.buildConfigMapStore() },
    "cronjobs":                        func(b *Builder) cache.Store { return b.buildCronJobStore() },
    "daemonsets":                      func(b *Builder) cache.Store { return b.buildDaemonSetStore() },
    "deployments":                     func(b *Builder) cache.Store { return b.buildDeploymentStore() },
    "endpoints":                       func(b *Builder) cache.Store { return b.buildEndpointsStore() },
    "horizontalpodautoscalers":        func(b *Builder) cache.Store { return b.buildHPAStore() },
    "ingresses":                       func(b *Builder) cache.Store { return b.buildIngressStore() },
    "jobs":                            func(b *Builder) cache.Store { return b.buildJobStore() },
    "leases":                          func(b *Builder) cache.Store { return b.buildLeases() },
    "limitranges":                     func(b *Builder) cache.Store { return b.buildLimitRangeStore() },
    "mutatingwebhookconfigurations":   func(b *Builder) cache.Store { return b.buildMutatingWebhookConfigurationStore() },
    "namespaces":                      func(b *Builder) cache.Store { return b.buildNamespaceStore() },
    "networkpolicies":                 func(b *Builder) cache.Store { return b.buildNetworkPolicyStore() },
    "nodes":                           func(b *Builder) cache.Store { return b.buildNodeStore() },
    "persistentvolumeclaims":          func(b *Builder) cache.Store { return b.buildPersistentVolumeClaimStore() },
    "persistentvolumes":               func(b *Builder) cache.Store { return b.buildPersistentVolumeStore() },
    "poddisruptionbudgets":            func(b *Builder) cache.Store { return b.buildPodDisruptionBudgetStore() },
    "pods":                            func(b *Builder) cache.Store { return b.buildPodStore() },
    "replicasets":                     func(b *Builder) cache.Store { return b.buildReplicaSetStore() },
    "replicationcontrollers":          func(b *Builder) cache.Store { return b.buildReplicationControllerStore() },
    "resourcequotas":                  func(b *Builder) cache.Store { return b.buildResourceQuotaStore() },
    "secrets":                         func(b *Builder) cache.Store { return b.buildSecretStore() },
    "services":                        func(b *Builder) cache.Store { return b.buildServiceStore() },
    "statefulsets":                    func(b *Builder) cache.Store { return b.buildStatefulSetStore() },
    "storageclasses":                  func(b *Builder) cache.Store { return b.buildStorageClassStore() },
    "validatingwebhookconfigurations": func(b *Builder) cache.Store { return b.buildValidatingWebhookConfigurationStore() },
    "volumeattachments":               func(b *Builder) cache.Store { return b.buildVolumeAttachmentStore() },
    "verticalpodautoscalers":          func(b *Builder) cache.Store { return b.buildVPAStore() },
}
  • 初始化watchfunc 接收结果
// E:\go_path\src\k8s.io\kube-state-metrics\internal\store\builder.go
// reflectorPerNamespace creates a Kubernetes client-go reflector with the given
// listWatchFunc for each given namespace and registers it with the given store.
func (b *Builder) reflectorPerNamespace(
    expectedType interface{},
    store cache.Store,
    listWatchFunc func(kubeClient clientset.Interface, ns string) cache.ListerWatcher,
) {
    lwf := func(ns string) cache.ListerWatcher { return listWatchFunc(b.kubeClient, ns) }
    lw := listwatch.MultiNamespaceListerWatcher(b.namespaces, nil, lwf)
    instrumentedListWatch := watch.NewInstrumentedListerWatcher(lw, b.metrics, reflect.TypeOf(expectedType).String())
    reflector := cache.NewReflector(sharding.NewShardedListWatch(b.shard, b.totalShards, instrumentedListWatch), expectedType, store, 0)
    go reflector.Run(b.ctx.Done())
}

指标列举

eg: configmap信息
kube_configmap_info{configmap="xxx",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="xxx"}
eg: cronjob下次调度时间
kube_cronjob_next_schedule_time{cronjob="abc",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="abc"}    1594306800
eg: ready  daemonset
kube_daemonset_status_number_ready{daemonset="npd",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="kube-admin"}    6
eg : 不健康的pod
kube_deployment_status_replicas_unavailable{deployment="coredns",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="kube-system"}
  • Endpoints Metrics : service向其发送流量的对象的IP地址

image.png

eg:  nginx avaiable eps
kube_endpoint_address_available{endpoint="nginx",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="xxx"}    
eg: 第三方hpa依据metric_name
kube_horizontalpodautoscaler_spec_target_metric{metric_name="xxxx"}
eg: ingress info
kube_ingress_info{ingress="xxxx",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="xxx"}    
eg:
kube_namespace_status_phase{instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="kube-system",phase="Active"}    

其中节点不健康状态有:MemoryPressure DiskPressure PIDPressure KernelDeadlock ReadonlyFilesystem

- 
eg: node节点不健康
kube_node_status_condition{condition="Ready",instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",node="xxxx.xxx.xxx.xx",status="unknown"}
eg: pod重启
idelta(kube_pod_container_status_restarts_total{}[1m]) > 0
eg:  代表pod在waiting状态
kube_pod_container_status_waiting_reason==1 
    其中状态有 
    ImagePullBackOff 
    CrashLoopBackOff 
    ErrImagePull 
    CreateContainerConfigError 
    CreateContainerError 
    InvalidImageName
eg: pod分配cpu
kube_pod_container_resource_requests_cpu_cores
eg: pod分配内存
kube_pod_container_resource_requests_memory_bytes
eg: pod pending
kube_pod_status_phase{phase=~"Pending|Unknown"}
    状态有
    Pending
    Succeeded
    Failed
    Running
    Unknown
资源分cpu和memory ,对象分pod和namespace,类型分used和hard

image.png

eg : ready statfulset pod
kube_statefulset_status_replicas_ready{instance="kube-state-metrics.kube-admin:8080",job="kube-state-metrics",namespace="kube-admin",statefulset="prometheus"}    
阅读 193

推荐阅读

监控系统的源码解析,运维开发经验交流

5 人关注
14 篇文章
专栏主页