prometheus通过alertmanager这个CRD来管理和部署alertmanager实例,默认是部署3个实例,3个实例组成集群,通过Gossip协议保持一致性。
由于部署了headless service,每个alertmanager实例都有1个唯一的标识。
1. 3个alertmanager实例
alertmanager使用statefulset部署,部署了3个pod:
# kubectl get all -n monitoring |grep alertmanager-main
pod/alertmanager-main-0 2/2 Running 0 19d
pod/alertmanager-main-1 2/2 Running 0 19d
pod/alertmanager-main-2 2/2 Running 1 18d
service/alertmanager-main ClusterIP 10.233.5.73 <none> 9093/TCP 19d
statefulset.apps/alertmanager-main 3/3 19d
3个alertmanager部署为集群,实例之间通过Gossip协议实现一致性:
# ps -ef|grep alertmanager
1000 51237 51205 0 Feb01 ? 01:34:55 /bin/alertmanager --config.file=/etc/alertmanager/config/alertmanager.yaml --storage.path=/alertmanager --data.retention=120h --cluster.listen-address=[10.233.97.7]:9094 --web.listen-address=:9093 --web.route-prefix=/ --cluster.peer=alertmanager-main-0.alertmanager-operated:9094 --cluster.peer=alertmanager-main-1.alertmanager-operated:9094 --cluster.peer=alertmanager-main-2.alertmanager-operated:9094
定义了一个headless service,让每个alertmanager实例有1个唯一的标识:
apiVersion: v1
kind: Service
metadata:
name: alertmanager-operated
namespace: monitoring
spec:
clusterIP: None
ports:
- name: web
port: 9093
protocol: TCP
targetPort: web
- name: tcp-mesh
port: 9094
protocol: TCP
targetPort: 9094
- name: udp-mesh
port: 9094
protocol: UDP
targetPort: 9094
selector:
app: alertmanager
sessionAffinity: None
type: ClusterIP
2. prometheus中配置访问alertmanager
到prometheus POD上看一下:
# kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring
/etc/prometheus/config_out $ vi prometheus.env.yaml
......
alerting:
alert_relabel_configs:
- action: labeldrop
regex: prometheus_replica
alertmanagers:
- path_prefix: /
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_name
regex: alertmanager-main
- action: keep
source_labels:
- __meta_kubernetes_endpoint_port_name
regex: web
.......
可以看到,alertManager的实例也是通过kubernetes_sd_configs来自动发现的,其筛选规则:
- service_name=alertmanager-main;
- endpoint_port_name=web;
看一下alertmanager-service.yaml,满足这一条件:
# cat alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
ports:
- name: web
port: 9093
targetPort: web
selector:
alertmanager: main
app: alertmanager
sessionAffinity: ClientIP
3. alertManager的CRD、service和serviceMonitor的定义
alertmanager的statefulset实际是通过alertmanager这个CRD来实现的:
# kubectl get crd -n monitoring
NAME CREATED AT
alertmanagers.monitoring.coreos.com 2021-02-01T03:13:36Z
......
# cat alertmanager-alertmanager.yaml
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
labels:
alertmanager: main
name: main
namespace: monitoring
spec:
image: "178.104.162.39:443/dev/huayun/amd64/alertmanager:v0.21.0"
nodeSelector:
kubernetes.io/os: linux
replicas: 3
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: alertmanager-main
version: v0.21.0
可以看到,上面定义的replica=3。
service和serviceMonitor的定义如下:
# cat alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
ports:
- name: web
port: 9093
targetPort: web
selector:
alertmanager: main
app: alertmanager
sessionAffinity: ClientIP
# cat alertmanager-serviceMonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
k8s-app: alertmanager
name: alertmanager
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: web
selector:
matchLabels:
alertmanager: main
定义了serviceMonitor,则alertmanager实例本身暴露的/metrics,是可以被Prometheus拉取的。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。