prometheus通过alertmanager这个CRD来管理和部署alertmanager实例,默认是部署3个实例,3个实例组成集群,通过Gossip协议保持一致性。

由于部署了headless service,每个alertmanager实例都有1个唯一的标识。

1. 3个alertmanager实例

alertmanager使用statefulset部署,部署了3个pod:

# kubectl get all -n monitoring |grep alertmanager-main
pod/alertmanager-main-0                    2/2     Running   0          19d
pod/alertmanager-main-1                    2/2     Running   0          19d
pod/alertmanager-main-2                    2/2     Running   1          18d
service/alertmanager-main       ClusterIP   10.233.5.73     <none>        9093/TCP                     19d
statefulset.apps/alertmanager-main   3/3     19d

3个alertmanager部署为集群,实例之间通过Gossip协议实现一致性:

# ps -ef|grep alertmanager
1000     51237 51205  0 Feb01 ?        01:34:55 /bin/alertmanager --config.file=/etc/alertmanager/config/alertmanager.yaml --storage.path=/alertmanager --data.retention=120h --cluster.listen-address=[10.233.97.7]:9094 --web.listen-address=:9093 --web.route-prefix=/ --cluster.peer=alertmanager-main-0.alertmanager-operated:9094 --cluster.peer=alertmanager-main-1.alertmanager-operated:9094 --cluster.peer=alertmanager-main-2.alertmanager-operated:9094

定义了一个headless service,让每个alertmanager实例有1个唯一的标识:

apiVersion: v1
kind: Service
metadata:
  name: alertmanager-operated
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: web
    port: 9093
    protocol: TCP
    targetPort: web
  - name: tcp-mesh
    port: 9094
    protocol: TCP
    targetPort: 9094
  - name: udp-mesh
    port: 9094
    protocol: UDP
    targetPort: 9094
  selector:
    app: alertmanager
  sessionAffinity: None
  type: ClusterIP

2. prometheus中配置访问alertmanager

到prometheus POD上看一下:

# kubectl exec -it prometheus-k8s-0 /bin/sh -n monitoring

/etc/prometheus/config_out $ vi prometheus.env.yaml
......
alerting:
  alert_relabel_configs:
  - action: labeldrop
    regex: prometheus_replica
  alertmanagers:
  - path_prefix: /
    scheme: http
    kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names:
        - monitoring
    relabel_configs:
    - action: keep
      source_labels:
      - __meta_kubernetes_service_name
      regex: alertmanager-main
    - action: keep
      source_labels:
      - __meta_kubernetes_endpoint_port_name
      regex: web
.......

可以看到,alertManager的实例也是通过kubernetes_sd_configs来自动发现的,其筛选规则:

  • service_name=alertmanager-main;
  • endpoint_port_name=web;

看一下alertmanager-service.yaml,满足这一条件:

# cat alertmanager-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9093
    targetPort: web
  selector:
    alertmanager: main
    app: alertmanager
  sessionAffinity: ClientIP

3. alertManager的CRD、service和serviceMonitor的定义

alertmanager的statefulset实际是通过alertmanager这个CRD来实现的:

# kubectl get crd -n monitoring
NAME                                                  CREATED AT
alertmanagers.monitoring.coreos.com                   2021-02-01T03:13:36Z
......

# cat alertmanager-alertmanager.yaml

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  labels:
    alertmanager: main
  name: main
  namespace: monitoring
spec:
  image: "178.104.162.39:443/dev/huayun/amd64/alertmanager:v0.21.0"
  nodeSelector:
    kubernetes.io/os: linux
  replicas: 3
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: alertmanager-main
  version: v0.21.0

可以看到,上面定义的replica=3。

service和serviceMonitor的定义如下:

# cat alertmanager-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9093
    targetPort: web
  selector:
    alertmanager: main
    app: alertmanager
  sessionAffinity: ClientIP


# cat alertmanager-serviceMonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: alertmanager
  name: alertmanager
  namespace: monitoring
spec:
  endpoints:
  - interval: 30s
    port: web
  selector:
    matchLabels:
      alertmanager: main

定义了serviceMonitor,则alertmanager实例本身暴露的/metrics,是可以被Prometheus拉取的。


a朋
63 声望38 粉丝