Prometheus range_query source code interpretation and high cardinality judgment are based on the time-consuming principle of query_log statistics at each stage

For high cardinality issues in time series databases, please refer to my previous article

High cardinality and three methods to determine high cardinality in prometheus

Today we explain the range_query principle of the second judgment method

And explain the principle of query_log statistics

Summarize the range_query query process

Parsing parameters
Set timeout and set opentracing
Initialize query and parse promql according to queryEngine
The exec function first sets the ExecTotalTime
The exec function enters the queue queuing settings and calculates ExecQueueTime
The exec function sets EvalTotalTime and executes the execEvalStmt function
The execEvalStmt function prepares the querier+select series settings on the storage and calculates the QueryPreparationTime
The execEvalStmt function sets the InnerEvalTime, and executes the evaluator.Eval(s.Expr) in the local memory after getting the series from the storage
The execEvalStmt function sets and calculates ResultSortTime

prometheus query_log configuration

Configuration

# global段开启log即可
global:
  query_log_file: /opt/logs/prometheus_query_log

range_query_log resolution

{
   # 请求基础信息
    "httpRequest":{
        "clientIP":"192.168.43.114",
        "method":"POST",
        "path":"/api/v1/query_range"
    },
    # 参数段
    "params":{
        "end":"2021-05-03T02:32:45.000Z",
        "query":"rate(node_disk_reads_completed_total{instance=~"192\\.168\\.43\\.114:9100"}[2m])",
        "start":"2021-05-03T02:17:45.000Z",
        "step":15
    },
    # 统计段
    "stats":{
        "timings":{
            "evalTotalTime":0.000331799,
            "resultSortTime":0.000001235,
            "queryPreparationTime":0.000075478,
            "innerEvalTime":0.00024141,
            "execQueueTime":0.000012595,
            "execTotalTime":0.000354698
        }
    },
    # 请求时间
    "ts":"2021-05-03T02:32:49.876Z"
}

Let's look at the query principle of range_query from the source code layer

D:\nyy_work\go_path\pkg\mod\github.com\prometheus\prometheus@v1.8.2-0.20210220213500-8c8de46003d1\web\api\v1\api.go

func (api *API) queryRange(r *http.Request) (result apiFuncResult) {}

step1: Parameter analysis

    // 解析start
    start, err := parseTime(r.FormValue("start"))
    if err != nil {
        return invalidParamError(err, "start")
    }
    // 解析end
    end, err := parseTime(r.FormValue("end"))
    if err != nil {
        return invalidParamError(err, "end")
    }
    // 判断end是否早于start
    if end.Before(start) {
        return invalidParamError(errors.New("end timestamp must not be before start time"), "end")
    }
    // 解析 step
    step, err := parseDuration(r.FormValue("step"))
    if err != nil {
        return invalidParamError(err, "step")
    }
    
    if step <= 0 {
        return invalidParamError(errors.New("zero or negative query resolution step widths are not accepted. Try a positive integer"), "step")
    }

Prevent too many points

    // 这里怎么理解？
    // For safety, limit the number of returned points per timeseries.
    // This is sufficient for 60s resolution for a week or 1h resolution for a year.
    if end.Sub(start)/step > 11000 {
        err := errors.New("exceeded maximum resolution of 11,000 points per timeseries. Try decreasing the query resolution (?step=XX)")
        return apiFuncResult{nil, &apiError{errorBadData, err}, nil, nil}
    }

Set timeout

    ctx := r.Context()
    if to := r.FormValue("timeout"); to != "" {
        var cancel context.CancelFunc
        timeout, err := parseDuration(to)
        if err != nil {
            return invalidParamError(err, "timeout")
        }

        ctx, cancel = context.WithTimeout(ctx, timeout)
        defer cancel()
    }

step2: Initialize query and parse promql according to queryEngine

func (ng *Engine) NewRangeQuery(q storage.Queryable, qs string, start, end time.Time, interval time.Duration) (Query, error) {
    expr, err := parser.ParseExpr(qs)
    if err != nil {
        return nil, err
    }
    if expr.Type() != parser.ValueTypeVector && expr.Type() != parser.ValueTypeScalar {
        return nil, errors.Errorf("invalid expression type %q for range query, must be Scalar or instant Vector", parser.DocumentedType(expr.Type()))
    }
    qry, err := ng.newQuery(q, expr, start, end, interval)
    if err != nil {
        return nil, err
    }
    qry.q = qs

    return qry, nil
}

Where does queryEngine come from?
expr, err := parser.ParseExpr(qs) parsing promql
instance_query and range_query call the same newQuery

step3: execute query

// Exec implements the Query interface.
func (q *query) Exec(ctx context.Context) *Result {
    if span := opentracing.SpanFromContext(ctx); span != nil {
        span.SetTag(queryTag, q.stmt.String())
    }

    // Exec query.
    res, warnings, err := q.ng.exec(ctx, q)

    return &Result{Err: err, Value: res, Warnings: warnings}
}

Use distributed tracing to track the query in stages and time-consuming
Core function exec

func (ng *Engine) exec(ctx context.Context, q *query) (v parser.Value, ws storage.Warnings, err error) {
    // prometheus_engine_queries 计数器，表示当前query个数
    ng.metrics.currentQueries.Inc()
    defer ng.metrics.currentQueries.Dec()

    ctx, cancel := context.WithTimeout(ctx, ng.timeout)
    q.cancel = cancel
    // 收尾函数，记录日志或者jager
    defer func() {
        ng.queryLoggerLock.RLock()
        if l := ng.queryLogger; l != nil {
            params := make(map[string]interface{}, 4)
            params["query"] = q.q
            if eq, ok := q.Statement().(*parser.EvalStmt); ok {
                params["start"] = formatDate(eq.Start)
                params["end"] = formatDate(eq.End)
                // The step provided by the user is in seconds.
                params["step"] = int64(eq.Interval / (time.Second / time.Nanosecond))
            }
            f := []interface{}{"params", params}
            if err != nil {
                f = append(f, "error", err)
            }
            f = append(f, "stats", stats.NewQueryStats(q.Stats()))
            if span := opentracing.SpanFromContext(ctx); span != nil {
                if spanCtx, ok := span.Context().(jaeger.SpanContext); ok {
                    f = append(f, "spanID", spanCtx.SpanID())
                }
            }
            if origin := ctx.Value(QueryOrigin{}); origin != nil {
                for k, v := range origin.(map[string]interface{}) {
                    f = append(f, k, v)
                }
            }
            if err := l.Log(f...); err != nil {
                ng.metrics.queryLogFailures.Inc()
                level.Error(ng.logger).Log("msg", "can't log query", "err", err)
            }
        }
        ng.queryLoggerLock.RUnlock()
    }()
    // execTotalTime 代表exec函数执行全部耗时不算log
    // defer先进后出，这个GetSpanTimer最后执行
    execSpanTimer, ctx := q.stats.GetSpanTimer(ctx, stats.ExecTotalTime)
    defer execSpanTimer.Finish()
    
    // ExecQueueTime代表队列中等待时间
    // 命令行参数query.max-concurrency
    // 如果日志中这个耗时高，考虑队列被慢查询占满了。对应在data目录下的queries.active文件


    queueSpanTimer, _ := q.stats.GetSpanTimer(ctx, stats.ExecQueueTime, ng.metrics.queryQueueTime)
    // Log query in active log. The active log guarantees that we don't run over
    // MaxConcurrent queries.
    if ng.activeQueryTracker != nil {
        queryIndex, err := ng.activeQueryTracker.Insert(ctx, q.q)
        if err != nil {
            queueSpanTimer.Finish()
            return nil, nil, contextErr(err, "query queue")
        }
        defer ng.activeQueryTracker.Delete(queryIndex)
    }
    queueSpanTimer.Finish()

    // Cancel when execution is done or an error was raised.
    defer q.cancel()

    const env = "query execution"
    // EvalTotalTime代表execEvalStmt函数执行时间
    evalSpanTimer, ctx := q.stats.GetSpanTimer(ctx, stats.EvalTotalTime)
    defer evalSpanTimer.Finish()

    // The base context might already be canceled on the first iteration (e.g. during shutdown).
    if err := contextDone(ctx, env); err != nil {
        return nil, nil, err
    }

    switch s := q.Statement().(type) {
    case *parser.EvalStmt:
        return ng.execEvalStmt(ctx, q, s)
    case parser.TestStmt:
        return nil, nil, s(ctx)
    }

    panic(errors.Errorf("promql.Engine.exec: unhandled statement of type %T", q.Statement()))
}

Core function execEvalStmt

func (ng *Engine) execEvalStmt(ctx context.Context, query *query, s *parser.EvalStmt) (parser.Value, storage.Warnings, error) {
    // QueryPreparationTime代表准备存储上的querier+select series时间
    prepareSpanTimer, ctxPrepare := query.stats.GetSpanTimer(ctx, stats.QueryPreparationTime, ng.metrics.queryPrepareTime)
    mint, maxt := ng.findMinMaxTime(s)
    querier, err := query.queryable.Querier(ctxPrepare, mint, maxt)
    if err != nil {
        prepareSpanTimer.Finish()
        return nil, nil, err
    }
    defer querier.Close()
    // populateSeries调用 select返回series
    ng.populateSeries(querier, s)
    prepareSpanTimer.Finish()
    // InnerEvalTime代表从存储拿到series后在本地内存中执行 evaluator.Eval(s.Expr)的时间
    // evaluator.Eval需要判断instance_query 还是range_query
    evalSpanTimer, ctxInnerEval := query.stats.GetSpanTimer(ctx, stats.InnerEvalTime, ng.metrics.queryInnerEval)
    ...
    // ResultSortTime代表排序耗时
    sortSpanTimer, _ := query.stats.GetSpanTimer(ctx, stats.ResultSortTime, ng.metrics.queryResultSort)
    sort.Sort(mat)
    sortSpanTimer.Finish()
}

Summarize the query process

Parsing parameters
Set timeout and set opentracing
Initialize query and parse promql according to queryEngine
The exec function first sets the ExecTotalTime
The exec function enters the queue queuing settings and calculates ExecQueueTime
The exec function sets EvalTotalTime and executes the execEvalStmt function
The execEvalStmt function prepares the querier+select series settings on the storage and calculates the QueryPreparationTime
The execEvalStmt function sets the InnerEvalTime, and executes the evaluator.Eval(s.Expr) in the local memory after getting the series from the storage
The execEvalStmt function sets and calculates ResultSortTime

So these time-consuming relationships are

EvalTotalTime=QueryPreparationTime+InnerEvalTime+ResultSortTime
- Calculated value 0.000075478+0.00024141+0.000001235=0.000318123
- Real value 0.000331799> 0.000318123
- It’s not the same, there is some code execution in the middle
ExecTotalTime=ExecQueueTime+EvalTotalTime
- Calculated value 0.000331799+0.000012595=0.000344394
- Real value 0.000354698> 0.000344394

{
   # 请求基础信息
    "httpRequest":{
        "clientIP":"192.168.43.114",
        "method":"POST",
        "path":"/api/v1/query_range"
    },
    # 参数段
    "params":{
        "end":"2021-05-03T02:32:45.000Z",
        "query":"rate(node_disk_reads_completed_total{instance=~"192\\.168\\.43\\.114:9100"}[2m])",
        "start":"2021-05-03T02:17:45.000Z",
        "step":15
    },
    # 统计段
    "stats":{
        "timings":{
            "evalTotalTime":0.000331799,
            "resultSortTime":0.000001235,
            "queryPreparationTime":0.000075478,
            "innerEvalTime":0.00024141,
            "execQueueTime":0.000012595,
            "execTotalTime":0.000354698
        }
    },
    # 请求时间
    "ts":"2021-05-03T02:32:49.876Z"
}

Several time-consuming questions

The most re-queries are in QueryPreparationTime, which is the select series stage
If the execQueueTime is very high, do not increase query.max-concurrency easily, you should find out the slowness and solve it
- Blindly increasing the queue depth will cause more serious oom problems
- Normally, each query has a short waiting time in the queue
innerEvalTime resultSortTime is generally not time-consuming

For the pre-aggregation scheme, please refer to the article I wrote before

Article address: open source project pre_query: speed
Project address: https://github.com/ning1875/pre_query

Prometheus range_query source code interpretation and high cardinality judgment are based on the time-consuming principle of query_log statistics at each stage

For high cardinality issues in time series databases, please refer to my previous article

Today we explain the range_query principle of the second judgment method

And explain the principle of query_log statistics

Summarize the range_query query process

prometheus query_log configuration

Configuration

range_query_log resolution

Let's look at the query principle of range_query from the source code layer

step1: Parameter analysis

step2: Initialize query and parse promql according to queryEngine

step3: execute query

Summarize the query process

So these time-consuming relationships are

Several time-consuming questions

For the pre-aggregation scheme, please refer to the article I wrote before

ning1875

引用和评论

k8s中的11运维开发方向是哪些如果快速提升k8s开发能力

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

大模型时代，后端程序员如何避免被AI卷死？

Go slice切片使用教程，一次通关！

腾讯 tRPC-Go 教学——（1）搭建服务

一文弄懂用Go实现MCP服务

gozero限流、熔断、降级如何实现？面试的时候怎么回答？

Prometheus range_query source code interpretation and high cardinality judgment are based on the time-consuming principle of query_log statistics at each stage

For high cardinality issues in time series databases, please refer to my previous article

Today we explain the range_query principle of the second judgment method

And explain the principle of query_log statistics

Summarize the range_query query process

prometheus query_log configuration

Configuration

range_query_log resolution

Let's look at the query principle of range_query from the source code layer

step1: Parameter analysis

step2: Initialize query and parse promql according to queryEngine

step3: execute query

Summarize the query process

So these time-consuming relationships are

Several time-consuming questions

For the pre-aggregation scheme, please refer to the article I wrote before

ning1875

引用和评论

k8s中的11运维开发方向是哪些 如果快速提升k8s开发能力

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

大模型时代，后端程序员如何避免被AI卷死？

Go slice切片使用教程，一次通关！

腾讯 tRPC-Go 教学——（1）搭建服务

一文弄懂用Go实现MCP服务

gozero限流、熔断、降级如何实现？面试的时候怎么回答？

k8s中的11运维开发方向是哪些如果快速提升k8s开发能力