prometheus-operator提供了一个Probe CRD对象,可以用来进行黑盒监控,具体的探测功能由Blackbox-exporter实现。
blackbox-exporter是prometheus社区提供的黑盒监控解决方案,支持用户通过HTTP、HTTPS、TCP、ICMP等方式对target进行网络探测。
一. 整体架构
具体使用时:
- 首先,用户创建一个Probe CRD对象,对象中指定探测方式、探测目标等参数;
- 然后,prometheus-operator watch到Probe对象创建,然后生成对应的prometheus拉取配置,reload到prometheus中;
- 最后,prometheus使用url=/probe?target={探测目标}&module={探测方式},拉取blackbox-exporter,此时blackbox-exporter会对目标进行探测,并以metrics格式返回探测结果;
二. 部署prometheus-operator
使用kube-prometheus部署prometheus-operator。
# git clone -b release-0.8 git@github.com:prometheus-operator/kube-prometheus.git
# cd kube-prometheus
首先,部署CRD:
# kubectl apply -f manifests/setup
# kubectl get crd |grep coreos
alertmanagerconfigs.monitoring.coreos.com 2022-05-19T06:44:00Z
alertmanagers.monitoring.coreos.com 2022-05-19T06:44:01Z
podmonitors.monitoring.coreos.com 2022-05-19T06:44:01Z
probes.monitoring.coreos.com 2022-05-19T06:44:01Z
prometheuses.monitoring.coreos.com 2022-05-19T06:45:04Z
prometheusrules.monitoring.coreos.com 2022-05-19T06:44:01Z
servicemonitors.monitoring.coreos.com 2022-05-19T06:44:01Z
thanosrulers.monitoring.coreos.com 2022-05-19T06:44:02Z
可以看到,部署了probes.monitoring.coreos.com这个CRD.
然后,部署prometheus-operator:
# kubectl apply -f manifests/
# kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 2/2 Running 0 46m
alertmanager-main-1 2/2 Running 0 46m
alertmanager-main-2 2/2 Running 0 46m
blackbox-exporter-5cb5d7479d-mznws 3/3 Running 0 49m
grafana-d595885ff-cf49m 1/1 Running 0 49m
kube-state-metrics-685d769786-tkv7l 3/3 Running 0 22m
node-exporter-4d6mq 2/2 Running 0 49m
node-exporter-8cr4v 2/2 Running 0 49m
node-exporter-krr2h 2/2 Running 0 49m
prometheus-adapter-6fd94587c9-6tsgb 0/1 Running 0 3s
prometheus-adapter-6fd94587c9-8zm2l 1/1 Running 4 (13m ago) 13m
prometheus-k8s-0 2/2 Running 0 46m
prometheus-k8s-1 2/2 Running 0 46m
prometheus-operator-7684989c7-qt2sp 2/2 Running 0 49m
部署完成后,给service: prometheus-k8s配置NodePort,以便访问Prometheus UI。
三. Blackbox-exporter的配置
Blackbox-exporter运行时,需要传入一个配置文件。
配置文件中列出了black-exporter支持的探针,比如icmp、tcp等,其中:
- 每一种探测配置称为一个module,以yaml格式提供;
每一个module包含:
- 探针类型:prober
- 超时时间:timeout
- ...
典型的black-exporter的配置文件:
apiVersion: v1
data:
config.yml: |-
"modules":
"http_2xx": # module名称
"http":
"preferred_ip_protocol": "ip4"
"prober": "http"
"http_post_2xx":
"http":
"method": "POST" # POST 请求
"preferred_ip_protocol": "ip4"
"prober": "http"
"tcp_connect": # tcp连接
"prober": "tcp"
"timeout": "10s"
"tcp":
"preferred_ip_protocol": "ip4"
"dns":
"prober": "dns"
"dns":
"transport_protocol": "udp"
"preferred_ip_protocol": "ipv4"
"query_name": "kubernetes.default.svc.cluster.local"
"icmp":
"prober": "icmp"
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: blackbox-exporter
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.18.0
name: blackbox-exporter-configuration
namespace: monitoring
四. 创建Probe对象
1. Probe ping
创建一个ping任务:
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: ping
namespace: monitoring
spec:
jobName: ping # 任务名称
prober: # 指定blackbox的地址
url: blackbox-exporter.monitoring:19115
module: icmp # 配置文件中的检测模块
targets: # 目标(可以是static配置也可以是ingress配置)
# ingress <Object>
staticConfig: # 如果配置了 ingress,静态配置优先
static:
- https://www.baidu.com
等待一会后,可以在prometheus的页面上看到任务:
对应的,在prometheus生成的配置:
- job_name: probe/monitoring/ping
honor_timestamps: true
params:
module:
- icmp
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /probe
scheme: http
follow_redirects: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: job
replacement: ping
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
target_label: __param_target
replacement: $1
action: replace
- source_labels: [__param_target]
separator: ;
regex: (.*)
target_label: instance
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: __address__
replacement: blackbox-exporter.monitoring:19115
action: replace
static_configs:
- targets:
- https://www.baidu.com
labels:
namespace: monitoring
2. Probe HTTP
创建一个HTTP任务:
apiVersion: monitoring.coreos.com/v1
kind: Probe
metadata:
name: domain-probe
namespace: monitoring
spec:
jobName: domain-probe # 任务名称
prober: # 指定blackbox的地址
url: blackbox-exporter:19115
module: http_2xx # 配置文件中的检测模块
targets: # 目标(可以是static配置也可以是ingress配置)
# ingress <Object>
staticConfig: # 如果配置了 ingress,静态配置优先
static:
- prometheus.io
等待一会后,可以在prometheus的页面上看到任务:
对应的,在prometheus生成的配置:
job_name: probe/monitoring/domain-probe
honor_timestamps: true
params:
module:
- http_2xx
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /probe
scheme: http
follow_redirects: true
relabel_configs:
- source_labels: [job]
separator: ;
regex: (.*)
target_label: __tmp_prometheus_job_name
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: job
replacement: domain-probe
action: replace
- source_labels: [__address__]
separator: ;
regex: (.*)
target_label: __param_target
replacement: $1
action: replace
- source_labels: [__param_target]
separator: ;
regex: (.*)
target_label: instance
replacement: $1
action: replace
- separator: ;
regex: (.*)
target_label: __address__
replacement: blackbox-exporter:19115
action: replace
static_configs:
- targets:
- prometheus.io
labels:
namespace: monitoring
3. 查看拉取的指标
可以向bloackbox-exporter发送curl命令,传入探测方式和探测目标,blackbox-exporter发起探测,并将探测结果以metrics的格式返回:
curl http://192.168.0.1:31392/probe?target=prometheus.io&module=http_2xx
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.275433879
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 2.373368898
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length -1
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.400100412
probe_http_duration_seconds{phase="processing"} 0.509387522
probe_http_duration_seconds{phase="resolve"} 0.365111732
probe_http_duration_seconds{phase="tls"} 1.200170298
probe_http_duration_seconds{phase="transfer"} 0.000451343
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 1
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 1
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_uncompressed_body_length Length of uncompressed response body
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 15757
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 2
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.590428662e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.686095999e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds 1.686095999e+09
# HELP probe_ssl_last_chain_info Contains SSL leaf certificate information
# TYPE probe_ssl_last_chain_info gauge
probe_ssl_last_chain_info{fingerprint_sha256="99ac7e7bf8d38ce32c95b2b3c965a9d2b479b0bf2e3b40c576173131a249f877"} 1
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
# HELP probe_tls_version_info Contains the TLS version used
# TYPE probe_tls_version_info gauge
probe_tls_version_info{version="TLS 1.3"} 1
五. Probe的源码分析
Prometheus-Operator对Probe CRD对象的处理,与其他CRD对象的处理过程类似:
- 首先,informer监听Probe CRD对象的变化;
- 然后,根据新CRD生成新的Prometheus配置,并reload到prometheus上;
1. 监听Probe CRD对象
通过Informer监听Probe CRD对象的变化。
首先,创建Informer:
// prometheus-operator/pkg/prometheus/operator.go
// New creates a new controller.
func New(ctx context.Context, conf operator.Config, logger log.Logger, r prometheus.Registerer) (*Operator, error) {
...
c := &Operator{
...
}
...
c.probeInfs, err = informers.NewInformersForResource(
informers.NewMonitoringInformerFactories(
c.config.Namespaces.AllowList,
c.config.Namespaces.DenyList,
mclient,
resyncPeriod,
nil,
),
monitoringv1.SchemeGroupVersion.WithResource(monitoringv1.ProbeName),
)
if err != nil {
return nil, errors.Wrap(err, "error creating probe informers")
}
...
return c, nil
}
然后,为Informer添加事件处理函数:
// prometheus-operator/pkg/prometheus/operator.go
// addHandlers adds the eventhandlers to the informers.
func (c *Operator) addHandlers() {
...
c.probeInfs.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: c.handleBmonAdd,
UpdateFunc: c.handleBmonUpdate,
DeleteFunc: c.handleBmonDelete,
})
...
}
看一下Add的事件处理函数:
- 将对象所在的namespace入队;
// TODO: Don't enqueue just for the namespace
func (c *Operator) handleBmonAdd(obj interface{}) {
if o, ok := c.getObject(obj); ok {
level.Debug(c.logger).Log("msg", "Probe added")
c.metrics.TriggerByCounter(monitoringv1.ProbesKind, "add").Inc()
c.enqueueForMonitorNamespace(o.GetNamespace())
}
}
2. 生成Prometheus配置
Prometheus-operator中,有工作线程从queue中获取发生变化的对象,然后对其进行调谐。
// prometheus-operator/pkg/prometheus/operator.go
func (c *Operator) sync(ctx context.Context, key string) error {
...
// 在这里处理 Probe 对象
if err := c.createOrUpdateConfigurationSecret(ctx, p, ruleConfigMapNames, assetStore); err != nil {
return errors.Wrap(err, "creating config failed")
}
...
}
对于Probe对象,根据其内容生成Prometheus配置,然后将其写入secret;
也就是说,Prometheus的配置被写入Secret对象,然后reloader sidecar将Secret的内容再reload到Prometheus;
func (c *Operator) createOrUpdateConfigurationSecret(ctx context.Context, p *monitoringv1.Prometheus, ruleConfigMapNames []string, store *assets.Store) error {
...
// 获取Probe对象
bmons, err := c.selectProbes(ctx, p, store)
if err != nil {
return errors.Wrap(err, "selecting Probes failed")
}
...
// 生成新的配置
conf, err := c.configGenerator.generateConfig(
p,
smons,
pmons,
bmons,
store.BasicAuthAssets,
store.BearerTokenAssets,
additionalScrapeConfigs,
additionalAlertRelabelConfigs,
additionalAlertManagerConfigs,
ruleConfigMapNames,
)
if err != nil {
return errors.Wrap(err, "generating config failed")
}
// 将配置写入Secret对象
s := makeConfigSecret(p, c.config)
...
}
具体由Probe对象生成Prometheus配置的过程:
// pkg/prometheus/promcfg.go
func (cg *configGenerator) generateProbeConfig(
version semver.Version,
m *v1.Probe,
apiserverConfig *v1.APIServerConfig,
basicAuthSecrets map[string]assets.BasicAuthCredentials,
bearerTokens map[string]assets.BearerToken,
ignoreHonorLabels bool,
overrideHonorTimestamps bool,
ignoreNamespaceSelectors bool,
enforcedNamespaceLabel string) yaml.MapSlice {
jobName := fmt.Sprintf("probe/%s/%s", m.Namespace, m.Name)
cfg := yaml.MapSlice{
{
Key: "job_name",
Value: jobName,
},
}
...
// metrics_path的配置
path := "/probe"
if m.Spec.ProberSpec.Path != "" {
path = m.Spec.ProberSpec.Path
}
cfg = append(cfg, yaml.MapItem{Key: "metrics_path", Value: path})
...
// params的配置
cfg = append(cfg, yaml.MapItem{Key: "params", Value: yaml.MapSlice{
{Key: "module", Value: []string{m.Spec.Module}},
}})
...
// static_configs的配置
if m.Spec.Targets.StaticConfig != nil {
staticConfig := yaml.MapSlice{
{Key: "targets", Value: m.Spec.Targets.StaticConfig.Targets},
}
if m.Spec.Targets.StaticConfig.Labels != nil {
if _, ok := m.Spec.Targets.StaticConfig.Labels["namespace"]; !ok {
m.Spec.Targets.StaticConfig.Labels["namespace"] = m.Namespace
}
} else {
m.Spec.Targets.StaticConfig.Labels = map[string]string{"namespace": m.Namespace}
}
staticConfig = append(staticConfig, yaml.MapSlice{
{Key: "labels", Value: m.Spec.Targets.StaticConfig.Labels},
}...)
cfg = append(cfg, yaml.MapItem{
Key: "static_configs",
Value: []yaml.MapSlice{staticConfig},
})
...
}
...
return cfg
}
参考:
1.https://docs.youdianzhishi.co...
2.官方doc: https://prometheus-operator.d...
3.probe的CRD: https://github.com/prometheus...
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。