Author: Liu An
A member of the Aikesheng testing team, mainly responsible for related testing tasks of the DTLE open source project, good at Python automated test development, and recently fascinated with the knowledge of Linux performance analysis and optimization.
Source of this article: original submission
* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.
background:
Although the introduction of various monitoring items is provided in the DTLE documentation, it is still a bit difficult for students who are not familiar with the configuration of prometheus and grafana. Today I will come to DTLE 3.21.07.0 to build a DTLE monitoring system.
1. Build DTLE operating environment
- Configure a two-node DTLE cluster for demonstration, and its topology is as follows:
When modifying the DTLE configuration file, you need to pay attention to the following two points:
- Turn on the monitoring of DTLE and ensure that the value of publish_metrics is true
- Enable nomad monitoring to ensure that telemetry
Here is the configuration of dtle-src-1 as an example. For the specific configuration, refer to the node configuration :
# DTLE 3.21.07.0中nomad升级为1.1.2,需要添加如下配置使nomad提供监控数据
# 之前版本的DTLE无需添加此配置
telemetry {
prometheus_metrics = true
collection_interval = "15s"
}
plugin "dtle" {
config {
data_dir = "/opt/dtle/var/lib/nomad"
nats_bind = "10.186.63.20:8193"
nats_advertise = "10.186.63.20:8193"
# Repeat the consul address above.
consul = "10.186.63.76:8500"
# By default, API compatibility layer is disabled.
api_addr = "10.186.63.20:8190" # for compatibility API
nomad_addr = "10.186.63.20:4646" # compatibility API need to access a nomad server
publish_metrics = true
stats_collection_interval = 15
}
}
- Add two jobs to simulate data transfer between two MySQL instances
Two, deploy prometheus
- Prepare prometheus configuration file to receive nomad and DTLE metrics at the same time
- The value of DTLE monitoring labels:instance is recommended to be set to the hostname of the DTLE server
shell> cat /path/to/prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_configs:
- job_name: 'nomad'
scrape_interval: 15s
metrics_path: '/v1/metrics'
params:
format: ['prometheus']
static_configs:
- targets: ['10.186.63.20:4646']
labels:
instance: nomad-src-1
- targets: ['10.186.63.76:4646']
labels:
instance: nomad-dest-1
- job_name: 'dtle'
scrape_interval: 15s
metrics_path: '/metrics'
static_configs:
- targets: ['10.186.63.20:8190']
labels:
instance: dtle-src-1
- targets: ['10.186.63.76:8190']
labels:
instance: dtle-dest-1
- Use docker to deploy prometheus service
shell> docker run -itd -p 9090:9090 --name=prometheus --hostname=prometheus --restart=always -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
- Visit the prometheus page http://${prometheus_server_ip}:9090/targets verify that the configuration takes effect
Three, deploy grafana
- Use docker to deploy grafana service
shell> docker run -d --name=grafana -p 3000:3000 grafana/grafana
- Visit the grafana page http://${grafana_server_ip}:3000 and log in with the default user admin/admin
- Configure to add data source
- Choose to add promethues
- Just add the access address of promethues to the URL and click the "sava & test" button
- Add panel
- Take adding a CPU usage monitor as an example to configure a panel
Four, commonly used monitoring items
Nomad all monitoring items: https://www.nomadproject.io/docs/operations/metrics
DTLE all monitoring items: https://actiontech.github.io/dtle-docs-cn/3/3.4_metrics.html
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。