视频教程
- 教程地址
实用功能总结
label 正则/正则非匹配
- 举例:pod状态
kube_pod_status_phase{pod!~"filebeat.*",job="kube-state-metrics", namespace !~"druid",phase=~"Pending|Unknown"}
agg 去掉/保留 label ,分布情况
- 去掉举例:
sum without(code) (rate(apiserver_request_total[2m] ) )
- 保留举例:
sum by(code) (rate(apiserver_request_total[2m] ) )
- 举例:apiserver 请求qps和按动作分布
sum (rate(apiserver_request_total[2m] ) ) by(verb)
- 举例:apiserver 请求qps和按动作,code分布
sum (rate(apiserver_request_total[2m] ) ) by(verb,code)
label_replace 变化label
- 举例:新增host标签内容为instance的ipaddr
label_replace(up, "host", "$1", "instance", "(.*):.*")
- 原始series
` up{instance="localhost:8080",job="cadvisor"} 1
- 改造后series
` up{host="localhost",instance="localhost:8080",job="cadvisor"} 1
topk bottomK 看top
- 举例:查看容器cpu使用率top5
topk(5,sum(rate (container_cpu_usage_seconds_total[1m])) by(pod))
同环比 相减
- 举例:qps环比1小时 掉10
sum (rate(apiserver_request_total[2m] offset 1h) ) - sum (rate(apiserver_request_total[2m] ) ) > 10
absent nodata报警
- ==1代表absent生效
- 举例:
absent(container_cpu_usage_seconds_total{pod=~"k8s-mon.*jddj2"})
查询时直接添加value过滤
- 举例:容器处于waiting状态
kube_pod_container_status_waiting==1
- 举例: 过滤cpu核数大于8的节点
kube_node_status_capacity_cpu_cores>8
- 举例:pod状态异常:
sum by (namespace, pod,cluster,phase) (kube_pod_status_phase{pod!~"filebeat.*",job="kube-state-metrics", namespace !~"druid",phase=~"Pending|Unknown"}) > 0
查询时直接做value计算
- 举例: 根据idle算util百分比
100 * (1 - avg by(instance)(irate(node_cpu{mode='idle'}[5m])))
组合条件 and
- 举例: m3db read 大于并且 db_write 大于1000
avg by(cluster) (rate(database_read_success{cluster="hawkeye-inf",instance="10.21.72.147:9004"}[1m]) ) > 100 and avg by(cluster) (rate(service_writeTaggedBatchRaw_success{cluster="hawkeye-inf",instance="10.21.77.55:9004"}[1m]) ) >1000
- 前后label一致
- or 同理
分位值histogram_quantile
- 举例查看apiserver 请求延迟90分位
histogram_quantile(0.90, sum(rate(apiserver_request_duration_seconds_bucket{verb!~"CONNECT|WATCH"}[5m])) by (le))
两组series关联
- 举例:apiserver 请求成功率
sum(apiserver_request_total{code=~"2.*|3.*"})/ sum(apiserver_request_total)
agg_over_time 给所有ts的value做agg
- 举例查看一天的alert
sort_desc(sum(sum_over_time(ALERTS{alertstate=`firing`}[24h])) by (alertname))
- 举例查看一天的alert
sort_desc(sum(sum_over_time(ALERTS{alertstate=`firing`}[24h])) by (alertname))
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。