传统的prometheus单进程部署模式下,我们如何定义报警规则:
- 修改配置文件prometheus.yaml,增加报警规则定义;
- POST /-/reload让配置生效;
在prometheus-operator部署模式下,我们仅需定义prometheusrule资源对象即可,operator监听到prometheusrule资源对象被创建,会自动为我们添加告警规则文件,自动reload。
1. 默认的告警规则
prometheus-operator部署出来的prometheus默认已经有一些规则,在prometheus-k8s-0这个pod的目录下面:
# kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
Defaulting container name to prometheus.
Use 'kubectl describe pod/prometheus-k8s-0 -n monitoring' to see all of the containers in this pod.
/prometheus $ ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
monitoring-prometheus-k8s-rules.yaml
而这个yaml文件,就是部署prometheus-operator的时候,提供的prometheus-rules.yaml文件内容:
# pwd
/etc/kubernetes/prometheus
# ls prometheus-rules.yaml
prometheus-rules.yaml
2. 创建prometheurule资源对象
我们创建1个prometheusrule资源对象后,prometheus-k8s-0这个pod下的prometheus-k8s-rulefile-0目录下,会生成一个{{namespace}}-{{rule_name}}.yaml文件。
# cat prometheus-etcdRules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: etcd-rules
namespace: monitoring
spec:
groups:
- name: etcd
rules:
- alert: EtcdClusterUnavailable
annotations:
summary: etcd cluster small
description: If one more etcd peer goes down the cluster will be unavailable
expr: |
count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 - 1)
for: 3m
labels:
severity: critical
yaml文件中需要标识label:
- prometheus=k8s;
- role=alert-rules;
因为prometheus实例的ruleSelector有如下的筛选规则:
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
yaml中定义了告警规则:etcd可用实例小于一半告警
count(up{job="etcd"} == 0) > (count(up{job="etcd"}) / 2 -1)
3. prometheus dashboard确认规则已生效
进入prometheus pod看规则文件是否生成:
# kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
/prometheus $ ls -alh /etc/prometheus/rules/prometheus-k8s-rulefiles-0/
total 0
lrwxrwxrwx 1 root 2000 33 Feb 2 07:40 monitoring-etcd-rules.yaml -> ..data/monitoring-etcd-rules.yaml
lrwxrwxrwx 1 root root 43 Feb 2 06:03 monitoring-prometheus-k8s-rules.yaml -> ..data/monitoring-prometheus-k8s-rules.yaml
访问prometheus dashboard确认规则已生效:
参考:
1.Prometheus-Operator自定义报警:https://www.qikqiak.com/post/...
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。