k8s tutorial description
- The essence of k8s underlying principle and source code explanation
- Advanced chapter of k8s underlying principle and source code explanation
- K8s pure source code interpretation course, help you become a k8s expert
- k8s-operator and crd combat development help you become a k8s expert
- Interpretation of source code of tekton full pipeline actual combat and pipeline operation principle
Prometheus full component tutorial
- 01_prometheus configuration and use of all components, analysis of underlying principles, and high availability
- 02_prometheus-thanos use and source code interpretation
- 03_kube-prometheus and prometheus-operator actual combat and principle introduction
- 04_prometheus source code explanation and secondary development
go language courses
- golang basic course
- Golang operation and maintenance platform actual combat, service tree, log monitoring, task execution, distributed detection
Problem Description
- For example, for a constructed pipeline indicator pipeline_step_duration, one label will be set as step
The steps contained in each pipeline may be different
# 比如 流水线a 第1次的step 包含clone 和build pipeline_step_duration{step="clone"} pipeline_step_duration{step="build"} # 第2次 的step 包含 build 和push pipeline_step_duration{step="build"} pipeline_step_duration{step="push"}
- So here comes the question: Do you want to delete the second pipeline_step_duration{step="build"}?
- In fact, it needs to be deleted in this scene, because clone is no longer included
The problem can be summarized as: the previously collected tags no longer exist, and the data must be cleaned up in time -- the question is how to clean up?
Do an experiment before discussing this issue: compare the deletion of inactive indicators by two common self-management methods
Experimental method: prometheus client-go sdk
- Start 1 rand_metrics
Contains rand_key, each time the key is different, test the result of requesting the metrics interface
var ( T1 = prometheus.NewGaugeVec(prometheus.GaugeOpts{ Name: "rand_metrics", Help: "rand_metrics", }, []string{"rand_key"}) )
Implementation 01 Directly implement the management in the business code: do not implement the Collector interface
The code is as follows, simulating extreme cases, generating random keys and values every 0.1 seconds to set metrics
package main import ( "fmt" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promhttp" "math/rand" "net/http" "time" ) var ( T1 = prometheus.NewGaugeVec(prometheus.GaugeOpts{ Name: "rand_metrics", Help: "rand_metrics", }, []string{"rand_key"}) ) func init() { prometheus.DefaultRegisterer.MustRegister(T1) } func RandStr(length int) string { str := "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" bytes := []byte(str) result := []byte{} rand.Seed(time.Now().UnixNano() + int64(rand.Intn(100))) for i := 0; i < length; i++ { result = append(result, bytes[rand.Intn(len(bytes))]) } return string(result) } func push() { for { randKey := RandStr(10) rand.Seed(time.Now().UnixNano() + int64(rand.Intn(100))) T1.With(prometheus.Labels{"rand_key": randKey}).Set(rand.Float64()) time.Sleep(100 * time.Millisecond) } } func main() { go push() addr := ":8081" http.Handle("/metrics", promhttp.Handler()) srv := http.Server{Addr: addr} err := srv.ListenAndServe() fmt.Println(err) }
After starting the service, request: 8081/metrics interface finds that the expired rand_key will remain and will not be cleaned up
# HELP rand_metrics rand_metrics # TYPE rand_metrics gauge rand_metrics{rand_key="00DsYGkd6x"} 0.02229735291486387 rand_metrics{rand_key="017UBn8S2T"} 0.7192676436571013 rand_metrics{rand_key="01Ar4ca3i1"} 0.24131184816722678 rand_metrics{rand_key="02Ay5kqsDH"} 0.11462075954697458 rand_metrics{rand_key="02JZNZvMng"} 0.9874169937518104 rand_metrics{rand_key="02arsU5qNT"} 0.8552103362564516 rand_metrics{rand_key="02nMy3thfh"} 0.039571420204118024 rand_metrics{rand_key="032cyHjRhP"} 0.14576779289125183 rand_metrics{rand_key="03DPDckbfs"} 0.6106184905871918 rand_metrics{rand_key="03lbtLwFUO"} 0.936911945555629 rand_metrics{rand_key="03wqYiguP2"} 0.20167059771916385 rand_metrics{rand_key="04uG2s3X0C"} 0.3324314184499403
Implementation 02 Implement the Collector interface
- Implement the collect interface in prometheus sdk: that is, bind the Collect and Describe methods to a structure
- Implement set label and assignment methods in Collect
Pass in desc in Describe
package main import ( "fmt" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promhttp" "log" "math/rand" "net/http" "time" ) var ( T1 = prometheus.NewDesc( "rand_metrics", "rand_metrics", []string{"rand_key"}, nil) ) type MyCollector struct { Name string } func (mc *MyCollector) Collect(ch chan<- prometheus.Metric) { log.Printf("MyCollector.collect.called") ch <- prometheus.MustNewConstMetric(T1, prometheus.GaugeValue, rand.Float64(), RandStr(10)) } func (mc *MyCollector) Describe(ch chan<- *prometheus.Desc) { log.Printf("MyCollector.Describe.called") ch <- T1 } func RandStr(length int) string { str := "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" bytes := []byte(str) result := []byte{} rand.Seed(time.Now().UnixNano() + int64(rand.Intn(100))) for i := 0; i < length; i++ { result = append(result, bytes[rand.Intn(len(bytes))]) } return string(result) } func main() { //go push() mc := &MyCollector{Name: "abc"} prometheus.MustRegister(mc) addr := ":8082" http.Handle("/metrics", promhttp.Handler()) srv := http.Server{Addr: addr} err := srv.ListenAndServe() fmt.Println(err) }
Metrics effect test: request: 8082/metrics interface found that rand_metrics always has only 1 value
# HELP rand_metrics rand_metrics # TYPE rand_metrics gauge rand_metrics{rand_key="e1JU185kE4"} 0.12268247569586412
And looking at the log, MyCollector.collect.called will be called every time we request the /metrics interface
2022/06/21 11:46:40 MyCollector.Describe.called 2022/06/21 11:46:44 MyCollector.collect.called 2022/06/21 11:46:47 MyCollector.collect.called 2022/06/21 11:46:47 MyCollector.collect.called 2022/06/21 11:46:47 MyCollector.collect.called 2022/06/21 11:46:47 MyCollector.collect.called
Phenomenon Summary
- The way to implement the Collector interface can meet the needs of expired metrics cleaning, and the dot function is triggered with the request of the /metrics interface
- The method that does not implement the Collector interface cannot meet the needs of clearing expired indicators, and the indicators will accumulate with the business management.
Reasons related to source code interpretation
01 Both methods are metrics obtained from web requests, so you must first look at the /metrics interface
- The entry is http.Handle("/metrics", promhttp.Handler())
- After tracking, I found that it is D:\go_path\pkg\mod\github.com\prometheus\client_golang@v1.12.2\prometheus\promhttp\http.go
The main logic is:
- Call the Gather method of reg to get the MetricFamily array
- Then encode and write to the resp of http
The pseudo code is as follows
func HandlerFor(reg prometheus.Gatherer, opts HandlerOpts) http.Handler { mfs, err := reg.Gather() for _, mf := range mfs { if handleError(enc.Encode(mf)) { return } } }
reg.Gather: Traverse the registered collectors in reg and call their collect method
First call their collect method to get metrics results
collectWorker := func() { for { select { case collector := <-checkedCollectors: collector.Collect(checkedMetricChan) case collector := <-uncheckedCollectors: collector.Collect(uncheckedMetricChan) default: return } wg.Done() } }
Then consume the data in chan and process the metrics
cmc := checkedMetricChan umc := uncheckedMetricChan for { select { case metric, ok := <-cmc: if !ok { cmc = nil break } errs.Append(processMetric( metric, metricFamiliesByName, metricHashes, registeredDescIDs, )) case metric, ok := <-umc: if !ok { umc = nil break } errs.Append(processMetric( metric, metricFamiliesByName, metricHashes, nil, ))
The processing method of processMetric is the same, so the difference of method 12 is in the collect method
02 Tracking of collect methods that do not implement the Collector interface
- Because what we registered in reg is the *GaugeVec pointer generated by prometheus.NewGaugeVec
- So the implementation is the collect method of *GaugeVec
And GaugeVec inherits MetricVec
type GaugeVec struct { *MetricVec }
And there is a metricMap object in MetricVec, so it is the collect method of metricMap in the end
type MetricVec struct { *metricMap curry []curriedLabelValue // hashAdd and hashAddByte can be replaced for testing collision handling. hashAdd func(h uint64, s string) uint64 hashAddByte func(h uint64, b byte) uint64 }
Observe the metricMap structure and methods
- metricMap has a map of metrics
And its Collect method is to traverse all the metricWithLabelValues interfaces in the inner layer of the map and insert it into ch for processing
// metricVecs. type metricMap struct { mtx sync.RWMutex // Protects metrics. metrics map[uint64][]metricWithLabelValues desc *Desc newMetric func(labelValues ...string) Metric } // Describe implements Collector. It will send exactly one Desc to the provided // channel. func (m *metricMap) Describe(ch chan<- *Desc) { ch <- m.desc } // Collect implements Collector. func (m *metricMap) Collect(ch chan<- Metric) { m.mtx.RLock() defer m.mtx.RUnlock() for _, metrics := range m.metrics { for _, metric := range metrics { ch <- metric.metric } } }
- Seeing this is very clear, as long as the elements in the metrics map are not displayed and deleted, the data will always exist
- There are exporters in this explicitly deleted genre, such as event_expoter
03 The collect method tracking of the way to implement the Collector interface
- Because our collector implements the collect method
So requesting Gather directly will call our collect method to get the result
func (mc *MyCollector) Collect(ch chan<- prometheus.Metric) { log.Printf("MyCollector.collect.called") ch <- prometheus.MustNewConstMetric(T1, prometheus.GaugeValue, rand.Float64(), RandStr(10)) }
- So it doesn't write to metricsMap, so there is only 1 value
Summarize
- The collect methods of the two dotting methods are different
In fact, the effect of the mainstream exporter is also inactive indicators will be deleted:
- For example, process-exporter monitors the process. If the process does not exist, the indicator curve will disappear: it is a breakpoint from the grafana diagram: otherwise, it will always exist once collected.
- For example, node-exporter monitors mount points, etc. When the mount point disappears, the relevant curve will disappear.
- Because the mainstream exporter adopts the way of implementing the collect method:
- In addition, kube-state-metrics in k8s uses metrics-store as the informer's store to watch etcd's delete event: when the pod is deleted, the related curve will also disappear
- Or you can call the delete method explicitly to delete the expired series from the map, but you need the last and this diff in the hold
- In short two schools: map explicitly delete VS implement collector interface
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。