视频教程

label 正则/正则非匹配

  • 举例:pod状态 kube_pod_status_phase{pod!~"filebeat.*",job="kube-state-metrics", namespace !~"druid",phase=~"Pending|Unknown"}

agg 去掉/保留 label ,分布情况

  • 去掉举例: sum without(code) (rate(apiserver_request_total[2m] ) )
  • 保留举例: sum by(code) (rate(apiserver_request_total[2m] ) )
  • 举例:apiserver 请求qps和按动作分布 sum (rate(apiserver_request_total[2m] ) ) by(verb)
  • 举例:apiserver 请求qps和按动作,code分布 sum (rate(apiserver_request_total[2m] ) ) by(verb,code)

label_replace 变化label

  • 举例:新增host标签内容为instance的ipaddr label_replace(up, "host", "$1", "instance", "(.*):.*")
  • 原始series ` up{instance="localhost:8080",job="cadvisor"} 1
  • 改造后series ` up{host="localhost",instance="localhost:8080",job="cadvisor"} 1

topk bottomK 看top

  • 举例:查看容器cpu使用率top5 topk(5,sum(rate (container_cpu_usage_seconds_total[1m])) by(pod))

同环比 相减

  • 举例:qps环比1小时 掉10 sum (rate(apiserver_request_total[2m] offset 1h) ) - sum (rate(apiserver_request_total[2m] ) ) > 10

absent nodata报警

  • ==1代表absent生效
  • 举例:absent(container_cpu_usage_seconds_total{pod=~"k8s-mon.*jddj2"})

查询时直接添加value过滤

  • 举例:容器处于waiting状态 kube_pod_container_status_waiting==1
  • 举例: 过滤cpu核数大于8的节点 kube_node_status_capacity_cpu_cores>8
  • 举例:pod状态异常:sum by (namespace, pod,cluster,phase) (kube_pod_status_phase{pod!~"filebeat.*",job="kube-state-metrics", namespace !~"druid",phase=~"Pending|Unknown"}) > 0

查询时直接做value计算

  • 举例: 根据idle算util百分比 100 * (1 - avg by(instance)(irate(node_cpu{mode='idle'}[5m])))

组合条件 and

  • 举例: m3db read 大于并且 db_write 大于1000 avg by(cluster) (rate(database_read_success{cluster="hawkeye-inf",instance="10.21.72.147:9004"}[1m]) ) > 100 and avg by(cluster) (rate(service_writeTaggedBatchRaw_success{cluster="hawkeye-inf",instance="10.21.77.55:9004"}[1m]) ) >1000
  • 前后label一致
  • or 同理

分位值histogram_quantile

  • 举例查看apiserver 请求延迟90分位 histogram_quantile(0.90, sum(rate(apiserver_request_duration_seconds_bucket{verb!~"CONNECT|WATCH"}[5m])) by (le))

两组series关联

  • 举例:apiserver 请求成功率 sum(apiserver_request_total{code=~"2.*|3.*"})/ sum(apiserver_request_total)

agg_over_time 给所有ts的value做agg

  • 举例查看一天的alert sort_desc(sum(sum_over_time(ALERTS{alertstate=`firing`}[24h])) by (alertname))
  • 举例查看一天的alert sort_desc(sum(sum_over_time(ALERTS{alertstate=`firing`}[24h])) by (alertname))

ning1875
167 声望67 粉丝

k8s/prometheus/cicd运维开发专家,想进阶的dy搜 小乙运维杂货铺