For high cardinality issues in time series databases, please refer to my previous article

Today we explain the range_query principle of the second judgment method

And explain the principle of query_log statistics

Summarize the range_query query process

  • Parsing parameters
  • Set timeout and set opentracing
  • Initialize query and parse promql according to queryEngine
  • The exec function first sets the ExecTotalTime
  • The exec function enters the queue queuing settings and calculates ExecQueueTime
  • The exec function sets EvalTotalTime and executes the execEvalStmt function
  • The execEvalStmt function prepares the querier+select series settings on the storage and calculates the QueryPreparationTime
  • The execEvalStmt function sets the InnerEvalTime, and executes the evaluator.Eval(s.Expr) in the local memory after getting the series from the storage
  • The execEvalStmt function sets and calculates ResultSortTime

prometheus query_log configuration

Configuration

# global段开启log即可
global:
  query_log_file: /opt/logs/prometheus_query_log

range_query_log resolution

{
   # 请求基础信息
    "httpRequest":{
        "clientIP":"192.168.43.114",
        "method":"POST",
        "path":"/api/v1/query_range"
    },
    # 参数段
    "params":{
        "end":"2021-05-03T02:32:45.000Z",
        "query":"rate(node_disk_reads_completed_total{instance=~"192\\.168\\.43\\.114:9100"}[2m])",
        "start":"2021-05-03T02:17:45.000Z",
        "step":15
    },
    # 统计段
    "stats":{
        "timings":{
            "evalTotalTime":0.000331799,
            "resultSortTime":0.000001235,
            "queryPreparationTime":0.000075478,
            "innerEvalTime":0.00024141,
            "execQueueTime":0.000012595,
            "execTotalTime":0.000354698
        }
    },
    # 请求时间
    "ts":"2021-05-03T02:32:49.876Z"
}

Let's look at the query principle of range_query from the source code layer

  • D:\nyy_work\go_path\pkg\mod\github.com\prometheus\prometheus@v1.8.2-0.20210220213500-8c8de46003d1\web\api\v1\api.go
func (api *API) queryRange(r *http.Request) (result apiFuncResult) {}

step1: Parameter analysis

    // 解析start
    start, err := parseTime(r.FormValue("start"))
    if err != nil {
        return invalidParamError(err, "start")
    }
    // 解析end
    end, err := parseTime(r.FormValue("end"))
    if err != nil {
        return invalidParamError(err, "end")
    }
    // 判断end是否早于start
    if end.Before(start) {
        return invalidParamError(errors.New("end timestamp must not be before start time"), "end")
    }
    // 解析 step
    step, err := parseDuration(r.FormValue("step"))
    if err != nil {
        return invalidParamError(err, "step")
    }
    
    if step <= 0 {
        return invalidParamError(errors.New("zero or negative query resolution step widths are not accepted. Try a positive integer"), "step")
    }
Prevent too many points
    // 这里怎么理解?
    // For safety, limit the number of returned points per timeseries.
    // This is sufficient for 60s resolution for a week or 1h resolution for a year.
    if end.Sub(start)/step > 11000 {
        err := errors.New("exceeded maximum resolution of 11,000 points per timeseries. Try decreasing the query resolution (?step=XX)")
        return apiFuncResult{nil, &apiError{errorBadData, err}, nil, nil}
    }
Set timeout
    ctx := r.Context()
    if to := r.FormValue("timeout"); to != "" {
        var cancel context.CancelFunc
        timeout, err := parseDuration(to)
        if err != nil {
            return invalidParamError(err, "timeout")
        }

        ctx, cancel = context.WithTimeout(ctx, timeout)
        defer cancel()
    }

step2: Initialize query and parse promql according to queryEngine

func (ng *Engine) NewRangeQuery(q storage.Queryable, qs string, start, end time.Time, interval time.Duration) (Query, error) {
    expr, err := parser.ParseExpr(qs)
    if err != nil {
        return nil, err
    }
    if expr.Type() != parser.ValueTypeVector && expr.Type() != parser.ValueTypeScalar {
        return nil, errors.Errorf("invalid expression type %q for range query, must be Scalar or instant Vector", parser.DocumentedType(expr.Type()))
    }
    qry, err := ng.newQuery(q, expr, start, end, interval)
    if err != nil {
        return nil, err
    }
    qry.q = qs

    return qry, nil
}
Where does queryEngine come from?

expr, err := parser.ParseExpr(qs) parsing promql

instance_query and range_query call the same newQuery

step3: execute query

// Exec implements the Query interface.
func (q *query) Exec(ctx context.Context) *Result {
    if span := opentracing.SpanFromContext(ctx); span != nil {
        span.SetTag(queryTag, q.stmt.String())
    }

    // Exec query.
    res, warnings, err := q.ng.exec(ctx, q)

    return &Result{Err: err, Value: res, Warnings: warnings}
}
Use distributed tracing to track the query in stages and time-consuming

Core function exec

func (ng *Engine) exec(ctx context.Context, q *query) (v parser.Value, ws storage.Warnings, err error) {
    // prometheus_engine_queries 计数器,表示当前query个数
    ng.metrics.currentQueries.Inc()
    defer ng.metrics.currentQueries.Dec()

    ctx, cancel := context.WithTimeout(ctx, ng.timeout)
    q.cancel = cancel
    // 收尾函数,记录日志或者jager
    defer func() {
        ng.queryLoggerLock.RLock()
        if l := ng.queryLogger; l != nil {
            params := make(map[string]interface{}, 4)
            params["query"] = q.q
            if eq, ok := q.Statement().(*parser.EvalStmt); ok {
                params["start"] = formatDate(eq.Start)
                params["end"] = formatDate(eq.End)
                // The step provided by the user is in seconds.
                params["step"] = int64(eq.Interval / (time.Second / time.Nanosecond))
            }
            f := []interface{}{"params", params}
            if err != nil {
                f = append(f, "error", err)
            }
            f = append(f, "stats", stats.NewQueryStats(q.Stats()))
            if span := opentracing.SpanFromContext(ctx); span != nil {
                if spanCtx, ok := span.Context().(jaeger.SpanContext); ok {
                    f = append(f, "spanID", spanCtx.SpanID())
                }
            }
            if origin := ctx.Value(QueryOrigin{}); origin != nil {
                for k, v := range origin.(map[string]interface{}) {
                    f = append(f, k, v)
                }
            }
            if err := l.Log(f...); err != nil {
                ng.metrics.queryLogFailures.Inc()
                level.Error(ng.logger).Log("msg", "can't log query", "err", err)
            }
        }
        ng.queryLoggerLock.RUnlock()
    }()
    // execTotalTime 代表exec函数执行全部耗时不算log
    // defer先进后出,这个GetSpanTimer最后执行
    execSpanTimer, ctx := q.stats.GetSpanTimer(ctx, stats.ExecTotalTime)
    defer execSpanTimer.Finish()
    
    // ExecQueueTime代表队列中等待时间
    // 命令行参数query.max-concurrency
    // 如果日志中这个耗时高,考虑队列被慢查询占满了。对应在data目录下的queries.active文件


    queueSpanTimer, _ := q.stats.GetSpanTimer(ctx, stats.ExecQueueTime, ng.metrics.queryQueueTime)
    // Log query in active log. The active log guarantees that we don't run over
    // MaxConcurrent queries.
    if ng.activeQueryTracker != nil {
        queryIndex, err := ng.activeQueryTracker.Insert(ctx, q.q)
        if err != nil {
            queueSpanTimer.Finish()
            return nil, nil, contextErr(err, "query queue")
        }
        defer ng.activeQueryTracker.Delete(queryIndex)
    }
    queueSpanTimer.Finish()

    // Cancel when execution is done or an error was raised.
    defer q.cancel()

    const env = "query execution"
    // EvalTotalTime代表execEvalStmt函数执行时间
    evalSpanTimer, ctx := q.stats.GetSpanTimer(ctx, stats.EvalTotalTime)
    defer evalSpanTimer.Finish()

    // The base context might already be canceled on the first iteration (e.g. during shutdown).
    if err := contextDone(ctx, env); err != nil {
        return nil, nil, err
    }

    switch s := q.Statement().(type) {
    case *parser.EvalStmt:
        return ng.execEvalStmt(ctx, q, s)
    case parser.TestStmt:
        return nil, nil, s(ctx)
    }

    panic(errors.Errorf("promql.Engine.exec: unhandled statement of type %T", q.Statement()))
}
Core function execEvalStmt
func (ng *Engine) execEvalStmt(ctx context.Context, query *query, s *parser.EvalStmt) (parser.Value, storage.Warnings, error) {
    // QueryPreparationTime代表准备存储上的querier+select series时间
    prepareSpanTimer, ctxPrepare := query.stats.GetSpanTimer(ctx, stats.QueryPreparationTime, ng.metrics.queryPrepareTime)
    mint, maxt := ng.findMinMaxTime(s)
    querier, err := query.queryable.Querier(ctxPrepare, mint, maxt)
    if err != nil {
        prepareSpanTimer.Finish()
        return nil, nil, err
    }
    defer querier.Close()
    // populateSeries调用 select返回series
    ng.populateSeries(querier, s)
    prepareSpanTimer.Finish()
    // InnerEvalTime代表从存储拿到series后在本地内存中执行 evaluator.Eval(s.Expr)的时间
    // evaluator.Eval需要判断instance_query 还是range_query
    evalSpanTimer, ctxInnerEval := query.stats.GetSpanTimer(ctx, stats.InnerEvalTime, ng.metrics.queryInnerEval)
    ...
    // ResultSortTime代表排序耗时
    sortSpanTimer, _ := query.stats.GetSpanTimer(ctx, stats.ResultSortTime, ng.metrics.queryResultSort)
    sort.Sort(mat)
    sortSpanTimer.Finish()
}

Summarize the query process

  • Parsing parameters
  • Set timeout and set opentracing
  • Initialize query and parse promql according to queryEngine
  • The exec function first sets the ExecTotalTime
  • The exec function enters the queue queuing settings and calculates ExecQueueTime
  • The exec function sets EvalTotalTime and executes the execEvalStmt function
  • The execEvalStmt function prepares the querier+select series settings on the storage and calculates the QueryPreparationTime
  • The execEvalStmt function sets the InnerEvalTime, and executes the evaluator.Eval(s.Expr) in the local memory after getting the series from the storage
  • The execEvalStmt function sets and calculates ResultSortTime

So these time-consuming relationships are

  • EvalTotalTime=QueryPreparationTime+InnerEvalTime+ResultSortTime

    • Calculated value 0.000075478+0.00024141+0.000001235=0.000318123
    • Real value 0.000331799> 0.000318123
    • It’s not the same, there is some code execution in the middle
  • ExecTotalTime=ExecQueueTime+EvalTotalTime

    • Calculated value 0.000331799+0.000012595=0.000344394
    • Real value 0.000354698> 0.000344394
{
   # 请求基础信息
    "httpRequest":{
        "clientIP":"192.168.43.114",
        "method":"POST",
        "path":"/api/v1/query_range"
    },
    # 参数段
    "params":{
        "end":"2021-05-03T02:32:45.000Z",
        "query":"rate(node_disk_reads_completed_total{instance=~"192\\.168\\.43\\.114:9100"}[2m])",
        "start":"2021-05-03T02:17:45.000Z",
        "step":15
    },
    # 统计段
    "stats":{
        "timings":{
            "evalTotalTime":0.000331799,
            "resultSortTime":0.000001235,
            "queryPreparationTime":0.000075478,
            "innerEvalTime":0.00024141,
            "execQueueTime":0.000012595,
            "execTotalTime":0.000354698
        }
    },
    # 请求时间
    "ts":"2021-05-03T02:32:49.876Z"
}

Several time-consuming questions

  • The most re-queries are in QueryPreparationTime, which is the select series stage
  • If the execQueueTime is very high, do not increase query.max-concurrency easily, you should find out the slowness and solve it

    • Blindly increasing the queue depth will cause more serious oom problems
    • Normally, each query has a short waiting time in the queue
  • innerEvalTime resultSortTime is generally not time-consuming

For the pre-aggregation scheme, please refer to the article I wrote before


ning1875
167 声望67 粉丝

k8s/prometheus/cicd运维开发专家,想进阶的dy搜 小乙运维杂货铺