基于docker搭建Prometheus

prometheus 本质上是一个时序数据库, 再配以alermanager pushgateway等子组件, 便可搭建成一个监控平台, 目前已经是比较主流的做法, 本文主要介绍一下此组件的简单使用和可以应用到的场景.

官方文档

docker配置

以docker-compose的形式进行配置

prometheus

基本配置

在文件夹新建一个docker-compose.yml文件, 将以下内容填入.

version: "3.7"

services:
  pro_server:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus:/prometheus
      - ./docker/prometheus.yml:/etc/prometheus/prometheus.yml
      - ./docker/test_rule.yml:/etc/prometheus/test_rule.yml

接下来新建./prometheus文件夹, 新建./docker/prometheus.yml文件, 写入以下信息

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - pro_alert_manager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "test_rule.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

prometheus的主体服务, docker-compose up的话, 就可以在浏览器进行Prometheus的初体验了.

这个配置文件是Prometheus的默认配置, 可以看到它自己声明了一个job: prometheus, 里面监听了自己的9090端口. 你可以自行观察 /metrics接口内的数据, 体会一下数据结构.

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 7.3e-06
go_gc_duration_seconds{quantile="0.25"} 8.8e-06
go_gc_duration_seconds{quantile="0.5"} 9.3e-06
go_gc_duration_seconds{quantile="0.75"} 0.000120499
go_gc_duration_seconds{quantile="1"} 0.000344099
go_gc_duration_seconds_sum 0.001536996
go_gc_duration_seconds_count 20
...

随便指指点点吧.

场景: 接口监听

这种场景就如同上面的默认配置一样, Prometheus会周期性pull接口, 获得metrics信息, 写入到自己的时序数据库中.

现在自己开发一个测试接口, 配置到pomethus中.

以python举例

import flask  
import random  
  
app = flask.Flask(__name__)  
  
@app.route('/metrics', methods=['GET'])  
def hello():  
    return f'suzumiya {{quantile="0.75"}} {random.random()}\nkyo {{quantile="0.5"}} {random.random()}'  
  
if __name__ == '__main__':  
    app.run('0.0.0.0', 12300)

这个metrics接口模仿了默认接口的数据, 接下来, 配置到Prometheus的配置文件中.

在./docker/prometheus.yml的末尾, 添加以下内容

  - job_name: 'test'
    static_configs:
    - targets: ['10.23.51.15:12300']    #改成自己的内网/外网IP
      labels:
        instance: 'test'

接下来重启项目. 便可以在页面上找到自己新加的指标.

alertmanager

对于一个监控平台来说, 告警是必不可少的. alertmanager便是来做这件事

基本配置

在docker-compose.yml文件中, 添加以下内容

  pro_alert_manager:
    image: prom/alertmanager
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager:/alertmanager
      - ./docker/alertmanager.yml:/etc/alertmanager/alertmanager.yml

新建./docker/alertmanager.yml文件, 填入以下内容

global:
  resolve_timeout: 5m
  smtp_smarthost:   #带端口
  smtp_from: 
  smtp_auth_username: 
  smtp_auth_password: 

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'mememe'
receivers:
- name: 'mememe'
  #webhook_configs:
  #- url: 'http://127.0.0.1:5001/'
  email_configs:
  - to: 'xxx@xxx.com'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

把邮件的配置填入上面相应的空上.

告警其实这里就配置完了, 但是不触发也就没有效果. 于是我们来配置一个规则, 用于监听test接口

将以下内容, 填入./docker/prometheus.yml文件.

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - pro_alert_manager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "test_rule.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

接下来是主体的规则配置, 修改docker-compose.yml文件, 添加外部配置文件映射

pro_server:  
image: prom/prometheus  
ports:  
- "9090:9090"  
volumes:  
- ./prometheus:/prometheus  
- ./docker/prometheus.yml:/etc/prometheus/prometheus.yml  
- ./docker/test_rule.yml:/etc/prometheus/test_rule.yml   # 新加的映射

新建./docker/test_rule.yml文件, 填入以下内容

groups:
- name: test-alert
  rules:
  - alert: HttpTestDown
    expr: sum(up{job="test"}) == 0
    for: 10s
    labels:
      severity: critical

重启项目, 此时不会有报警,
如果你的test server服务还开着的话. 那么将 test server 关掉. 很快应该就会收到一封邮件了.

pushgateway

这个组件可以简单理解成一个打点服务器, 你对这个组件发请求, 这个组件再推送到Prometheus中.

基本配置

修改docker-compose.yml, 添加以下内容:

  pro_push_gateway:
    image: prom/pushgateway
    ports:
      - "9091:9091"
    volumes:
      - ./pushgateway:/pushgateway

修改./docker/prometheus.yml, 添加pushgateway为job

  - job_name: 'pushgateway'
    static_configs:
    - targets: ['pro_push_gateway:9091']
      labels:
        instance: 'pushgateway

之后重启项目, gateway就可以生效了.
调用的方式有很多种, 这里作为测试选用最简单的curl方式.

echo "suzumiya 1000" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/test
echo "suzumiya 2000" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/test
echo "suzumiya 3000" | curl --data-binary @- http://127.0.0.1:9091/metrics/job/test

随便推推, 就可以在Prometheus中看到相应数据了.

场景: 事件打点

基于这个pushgateway, 我们可以由服务自己向promethus推送相应的数据, 比较直观的应用, 就是事件打点. 我们可以将感兴趣的事件推送到promethus上, 用alertmanager去监控, 又或者连接granfana做一个简单的时序看板.

完整配置文件

./docker-compose.yml

version: "3.7"

services:
  pro_server:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus:/prometheus
      - ./docker/prometheus.yml:/etc/prometheus/prometheus.yml
      - ./docker/test_rule.yml:/etc/prometheus/test_rule.yml
  pro_push_gateway:
    image: prom/pushgateway
    ports:
      - "9091:9091"
    volumes:
      - ./pushgateway:/pushgateway
  pro_alert_manager:
    image: prom/alertmanager
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager:/alertmanager
      - ./docker/alertmager/alertmanager.yml

./docker/prometheus.yml

  
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - pro_alert_manager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "test_rule.yml"
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'test'
    static_configs:
    - targets: ['10.23.51.15:12300']
      labels:
        instance: 'test'

  - job_name: 'pushgateway'
    static_configs:
    - targets: ['pro_push_gateway:9091']
      labels:
        instance: 'pushgateway'

./docker/alertmanager.yml

global:
  resolve_timeout: 5m
  smtp_smarthost:   #带端口
  smtp_from: 
  smtp_auth_username: 
  smtp_auth_password: 

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'mememe'
receivers:
- name: 'mememe'
  #webhook_configs:
  #- url: 'http://127.0.0.1:5001/'
  email_configs:
  - to: 'xxx@xxx.com'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

./docker/test_rule.yml

groups:
- name: test-alert
  rules:
  - alert: HttpTestDown
    expr: sum(up{job="test"}) == 0
    for: 10s
    labels:
      severity: critical

over

上面只是小试牛刀. 可以发现配置一个监控平台并不难. 相比较以前自己去一遍又一遍单独写告警, 这种统一的接口监听事件打点要更跨平台, 也更优雅. 这个组件还有很多值得去学习发掘的东西. 在有监控相关需求的时候, 不妨考虑下, Prometheus做不做得到?

owari.

基于docker搭建Prometheus

官方文档

docker配置

prometheus

基本配置

场景: 接口监听

alertmanager

基本配置

pushgateway

基本配置

场景: 事件打点

完整配置文件

over

JhonSmith

引用和评论

ssh配置新增用户与密钥登陆

构建混合技术栈的统一监控与日志平台

Prometheus+Grafana+Alertmanager监控

剑指大规模 AI 可观测，阿里云 Prometheus 2.0 应运而生

Spring Boot 监控缺失 JVM 指标的根源解析与终极解决方案

基于docker搭建Prometheus

官方文档

docker配置

prometheus

基本配置

场景: 接口监听

alertmanager

基本配置

pushgateway

基本配置

场景: 事件打点

完整配置文件

over

JhonSmith

引用和评论

ssh配置 新增用户与密钥登陆

构建混合技术栈的统一监控与日志平台

Prometheus+Grafana+Alertmanager监控

剑指大规模 AI 可观测，阿里云 Prometheus 2.0 应运而生

Spring Boot 监控缺失 JVM 指标的根源解析与终极解决方案

ssh配置新增用户与密钥登陆