Prometheus 简单使用及 exporter 开发

安装路径 Prometheus /usr/local/devops/prometheus
  • 添加用户组groupadd prometheus
  • 添加用户 useradd -g prometheus -m -d /var/lib/prometheus -s /sbin/nologin prometheus
  • 创建服务 vim /etc/systemd/system/prometheus.service
[Unit]
Description=prometheus
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/devops/prometheus/prometheus --config.file=/usr/local/devops/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus
Restart=on-failure
[Install]
WantedBy=multi-user.target
  • 下载 node_exporter 并解压 vim /etc/systemd/system/node_exporter.service
[Unit]
Description=node_exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/devops/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target

Node Exporter默认的抓取地址为http://IP:9100/metrics, 在 prometheus.yml 文件中添加

- job_name: 'node1-metrics'
    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: node1

重启 Prometheus 服务

prometheus 采用 pull 发方式采集数据,所以 exporter 需要按照 Prometheus 采集数据的格式暴露出需要采集的数据, 这个数据通过 curl ip:port/metrics 可以看得到,如:
curl 127.0.0.1:9100/metrics

# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.55175897068e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.16535296e+08
# HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes -1
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter

以第一行来举例:# 号表示注释说明 annotation,process_start_time_seconds 为 metric name,后边的就是对应的值

Prometheus的Client Library提供度量的四种基本类型包括:

  • Counter 计数器
  • Gauge 仪表盘
  • Histogram 直方图
  • Summary 概要

当访问Exporter的/metrics API地址时我们可以看到类似于一下返回值,其中HELP用于说明度量类型,TYPE用于数据类型说明。

# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 255.477922

# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 312.0

Counter

Counter类型好比计数器,用于统计类似于:CPU时间,API访问总次数,异常发生次数等等场景。这些指标的特点就是增加不减少。
因此当我们需要统计CPU的使用率时,我们需要使用rate()函数计算该Counter在过去一段时间内在每一个时间序列上的每秒的平均增长率

Gauge

Gauge类型,英文直译的话叫“计量器”,但是和Counter的翻译太类似了,因此我个人更喜欢使用”仪表盘“这个称呼。仪表盘的特点就是数值是可以增加或者减少的。因此Gauge适合用于如:当前内存使用率,当前CPU使用率,当前温度,当前速度等等一系列的监控指标。

Histogram

Histogram 柱状图这个比较直接,更多的是用于统计一些数据分布的情况,用于计算在一定范围内的分布情况,同时还提供了度量指标值的总和。

Summary

Summary摘要和Histogram柱状图比较类似,主要用于计算在一定时间窗口范围内度量指标对象的总数以及所有对量指标值的总和。

用 php 写一个 psr-15 middleware 的 prometheus exporter

安装prometheus php client

composer require jimdo/prometheus_client_php

class Prometheus implements MiddlewareInterface
{

    /**
     * @param ServerRequestInterface $request
     * @param RequestHandlerInterface $handler
     * @return ResponseInterface
     * @throws \Prometheus\Exception\MetricsRegistrationException
     */
    public function process(ServerRequestInterface $request, RequestHandlerInterface $handler): ResponseInterface
    {

        $start = microtime( true);
        $uri = $request->getUri()->getPath();
        $registry = new CollectorRegistry(new Redis());
        // export
        if ($uri === '/metrics') {
            $render = new RenderTextFormat();
            $metrics = $render->render($registry->getMetricFamilySamples());
            $response = new TextResponse($metrics, 200);
//            $response->withHeader('Content-type',  RenderTextFormat::MIME_TYPE);
            return $response;
        }

        $response = $handler->handle($request);
        $end = microtime(true);
        $duration = ($end - $start);
        $statusCode = $response->getStatusCode();
        $context = $request->getAttribute('context');
        $routes = $context->getRoutes();
        $method = $request->getMethod();
        $labels = ['status_code', 'method', 'route'];
        foreach ($routes as $route) {
            $labelValues = [$method, $statusCode, $route];
            $counter = $registry->registerCounter(
                'knight',
                'knight_request_total', 'Total number of HTTP requests',
                $labels
            );

            $counter->inc($labelValues);
            $histogram = $registry->registerHistogram(
                'knight',
                'knight_request_duration_seconds',
                'duration histogram of http responses',
                $labels,
                [0.005, 0.05, 0.1, 0.5, 1.5, 10]
            );
            
            $histogram->observe($duration, $labelValues);
        }


        return $response;
    }
}

curl https://blog.sangsay.com/api/metrics

response:

# HELP knight_request_duration_seconds duration histogram of http responses
# TYPE knight_knight_request_duration_seconds histogram
knight_knight_request_duration_seconds_bucket{status_code="GET",method="200",route="/posts",le="0.005"} 0
knight_knight_request_duration_seconds_bucket{status_code="GET",method="200",route="/posts",le="0.05"} 4
knight_knight_request_duration_seconds_bucket{status_code="GET",method="200",route="/posts",le="0.1"} 4
knight_knight_request_duration_seconds_bucket{status_code="GET",method="200",route="/posts",le="0.5"} 4
knight_knight_request_duration_seconds_bucket{status_code="GET",method="200",route="/posts",le="1.5"} 4
knight_knight_request_duration_seconds_bucket{status_code="GET",method="200",route="/posts",le="10"} 4
knight_knight_request_duration_seconds_bucket{status_code="GET",method="200",route="/posts",le="+Inf"} 4
knight_knight_request_duration_seconds_count{status_code="GET",method="200",route="/posts"} 4
knight_knight_request_duration_seconds_sum{status_code="GET",method="200",route="/posts"} 0.0942027568817144
# HELP knight_knight_request_total Total number of HTTP requests
# TYPE knight_knight_request_total counter
knight_knight_request_total{status_code="GET",method="200",route="/posts"} 4
阅读 1.6k

推荐阅读