Nebula Graph Source Code Interpretation Series｜ Vol.02 Detailed Validator

Nebula Graph 源码解读系列｜ Vol.02 详解 Validator

Overall structure

Nebula Graph Query Engine is mainly divided into four modules, namely Parser, Validator, Optimizer and Executor.

Parser completes the lexical and grammatical analysis of the sentence and generates an abstract syntax tree (AST). The Validator converts the AST into an execution plan, the Optimizer optimizes the execution plan, and the Executor is responsible for the calculation of the actual data.

In this article, we mainly introduce the implementation principle of Validator.

Directory Structure

The Validator code is implemented in the src/validator and src/planner directories.

src/validator directory mainly includes the Validator implementation of various clauses, such as OrderByValidator , LimitValidator , GoValidator and so on.

validator/
├── ACLValidator.h
├── AdminJobValidator.h
├── AdminValidator.h
├── AssignmentValidator.h
├── BalanceValidator.h
├── DownloadValidator.h
├── ExplainValidator.h
├── FetchEdgesValidator.h
├── FetchVerticesValidator.h
├── FindPathValidator.h
├── GetSubgraphValidator.h
├── GoValidator.h
├── GroupByValidator.h
├── IngestValidator.h
├── LimitValidator.h
├── LookupValidator.h
├── MaintainValidator.h
├── MatchValidator.h
├── MutateValidator.h
├── OrderByValidator.h
├── PipeValidator.h
├── ReportError.h
├── SequentialValidator.h
├── SetValidator.h
├── TraversalValidator.h
├── UseValidator.h
├── Validator.h
└── YieldValidator.h

The src/planner/plan directory defines the data structure of all PlanNodes and is used to generate the final execution plan . For example, when the query statement contains an aggregate function, an Aggregate node will be generated in the execution plan. The Aggregate class will specify all the information required for the calculation of the aggregate function, including grouping columns and aggregate function expressions. The Aggregate class is defined in Query.h . Nebula defines more than one hundred PlanNodes, PlanNode::kind defined in PlanNode.h, so I won't elaborate on it here.

planner/plan/
├── Admin.cpp          
├── Admin.h             // administration related  nodes
├── Algo.cpp
├── Algo.h              // graph algorithm related nodes
├── ExecutionPlan.cpp
├── ExecutionPlan.h     // explain and profile nodes
├── Logic.cpp
├── Logic.h             // nodes introduced by the implementation layer
├── Maintain.cpp
├── Maintain.h          // schema related nodes
├── Mutate.cpp
├── Mutate.h            // DML related nodes
├── PlanNode.cpp
├── PlanNode.h          // plan node base classes
├── Query.cpp
├── Query.h             // DQL related nodes
└── Scan.h              // index related nodes

The src/planner directory also defines the planner implementation of nGQL and match statements, which is used to generate nGQL and match statement execution plans.

Source code analysis

The validator entry function is Validator::validate(Sentence*, QueryContext*) , which is responsible for converting the abstract syntax tree generated by the parser into an execution plan. The final execution plan root node will be stored in the QueryContext. The function code is as follows:

Status Validator::validate(Sentence* sentence, QueryContext* qctx) {
    DCHECK(sentence != nullptr);
    DCHECK(qctx != nullptr);

    // Check if space chosen from session. if chosen, add it to context.
    auto session = qctx->rctx()->session();
    if (session->space().id > kInvalidSpaceID) {
        auto spaceInfo = session->space();
        qctx->vctx()->switchToSpace(std::move(spaceInfo));
    }

    auto validator = makeValidator(sentence, qctx);
    NG_RETURN_IF_ERROR(validator->validate());

    auto root = validator->root();
    if (!root) {
        return Status::SemanticError("Get null plan from sequential validator");
    }
    qctx->plan()->setRoot(root);
    return Status::OK();
}

This function first obtains the space information of the current session and saves it in the ValidateContext, and then calls the Validator::makeValidator() and Validator::validate() functions.

Validator::makeValidator() is to generate the validator of the clause. This function will first generate the SequentialValidator. SequentialValidator is the entry of the validator. All statements will first generate the SequentialValidator.

SequentialValidator::validateImpl() function will call Validator::makeValidator() generate the validator of the corresponding clause. The function code is as follows:

Status SequentialValidator::validateImpl() {
    Status status;
    if (sentence_->kind() != Sentence::Kind::kSequential) {
        return Status::SemanticError(
                "Sequential validator validates a SequentialSentences, but %ld is given.",
                static_cast<int64_t>(sentence_->kind()));
    }
    auto seqSentence = static_cast<SequentialSentences*>(sentence_);
    auto sentences = seqSentence->sentences();

    seqAstCtx_->startNode = StartNode::make(seqAstCtx_->qctx);
    for (auto* sentence : sentences) {
        auto validator = makeValidator(sentence, qctx_);
        NG_RETURN_IF_ERROR(validator->validate());
        seqAstCtx_->validators.emplace_back(std::move(validator));
    }

    return Status::OK();
}

Similarly, PipeValidator, AssignmentValidator and SetValidator will also generate validators for corresponding clauses.

Validator::validate() is responsible for generating the execution plan, the function code is as follows:

Status Validator::validate() {
    auto vidType = space_.spaceDesc.vid_type_ref().value().type_ref().value();
    vidType_ = SchemaUtil::propTypeToValueType(vidType);

    NG_RETURN_IF_ERROR(validateImpl());

    // Check for duplicate reference column names in pipe or var statement
    NG_RETURN_IF_ERROR(checkDuplicateColName());

    // Execute after validateImpl because need field from it
    if (FLAGS_enable_authorize) {
        NG_RETURN_IF_ERROR(checkPermission());
    }

    NG_RETURN_IF_ERROR(toPlan());

    return Status::OK();
}

This function first checks the space and user permissions and other information, and then calls the function Validator:validateImpl() complete the clause verification. The validateImpl() function is a pure virtual function of the Validator class. It uses polymorphism to call the validatorImpl() implementation function of different clauses. Finally, call the Validator::toPlan() function to generate the execution plan, the toPlan() function will generate the execution plan of the clause, and the sub-execution plan will be connected to form a complete execution plan. For example, in the match statement, the sub-execution plan is connected through the function MatchPlanner::connectSegments() , and the nGQL statement is implemented Validator::appendPlan()

For example

Let's take the nGQL statement as an example to introduce the above process in detail.

Statement:

GO 3 STEPS FROM "vid" OVER edge 
WHERE $$.tag.prop > 30 
YIELD edge._dst AS dst 
| ORDER BY $-.dst

This nGQL statement mainly goes through three processes in the validator phase:

Make clause validator

Validator::makeValidator() will be called to generate SequentialValidator. In the SequentialValidator::validateImpl() function, PipeValidator will be generated. PipeValidator will create validators for the left and right clauses, namely GoValidator and OrderByValidator.

Clause verification

The clause verification phase will verify the Go and OrderBy clauses respectively.

Take the Go statement as an example, it will first verify semantic errors, such as improper use of aggregate function, mismatched expression types, etc., and then verify internal clauses in turn. During the verification process, the intermediate results of the verification will be stored in the GoContext. , As the basis for GoPlanner to generate the execution plan. For example, validateWhere() will save the filter condition expression for later generation of the Filter execution plan node.

    NG_RETURN_IF_ERROR(validateStep(goSentence->stepClause(), goCtx_->steps));  // 校验 step 子句
    NG_RETURN_IF_ERROR(validateStarts(goSentence->fromClause(), goCtx_->from)); // 校验 from 子句
    NG_RETURN_IF_ERROR(validateOver(goSentence->overClause(), goCtx_->over));   // 校验 over 子句
    NG_RETURN_IF_ERROR(validateWhere(goSentence->whereClause()));               // 校验 where 子句
    NG_RETURN_IF_ERROR(validateYield(goSentence->yieldClause()));               // 校验 yield 子句

plan generation

The sub-execution plan of the Go statement is generated by the GoPlanner::transform(Astcontext*) function, the code is as follows:

StatusOr<SubPlan> GoPlanner::transform(AstContext* astCtx) {
    goCtx_ = static_cast<GoContext *>(astCtx);
    auto qctx = goCtx_->qctx;
    goCtx_->joinInput = goCtx_->from.fromType != FromType::kInstantExpr;
    goCtx_->joinDst = !goCtx_->exprProps.dstTagProps().empty();

    SubPlan startPlan = QueryUtil::buildStart(qctx, goCtx_->from, goCtx_->vidsVar);

    auto& steps = goCtx_->steps;
    if (steps.isMToN()) {
        return mToNStepsPlan(startPlan);
    }

    if (steps.steps() == 0) {
        auto* pt = PassThroughNode::make(qctx, nullptr);
        pt->setColNames(std::move(goCtx_->colNames));
        SubPlan subPlan;
        subPlan.root = subPlan.tail = pt;
        return subPlan;
    }

    if (steps.steps() == 1) {
        return oneStepPlan(startPlan);
    }
    return nStepsPlan(startPlan);
}

This function first calls QueryUtil::buildStart() to construct the start node, and then generates a plan in different ways according to the four different steps. In this example, the statement will use the nStepPlan strategy.

The function code of GoPlanner::nStepsPlan() is as follows:

SubPlan GoPlanner::nStepsPlan(SubPlan& startVidPlan) {
    auto qctx = goCtx_->qctx;

    auto* start = StartNode::make(qctx);
    auto* gn = GetNeighbors::make(qctx, start, goCtx_->space.id);
    gn->setSrc(goCtx_->from.src);
    gn->setEdgeProps(buildEdgeProps(true));
    gn->setInputVar(goCtx_->vidsVar);

    auto* getDst = QueryUtil::extractDstFromGN(qctx, gn, goCtx_->vidsVar);

    PlanNode* loopBody = getDst;
    PlanNode* loopDep = nullptr;
    if (goCtx_->joinInput) {
        auto* joinLeft = extractVidFromRuntimeInput(startVidPlan.root);
        auto* joinRight = extractSrcDstFromGN(getDst, gn->outputVar());
        loopBody = trackStartVid(joinLeft, joinRight);
        loopDep = joinLeft;
    }

    auto* condition = loopCondition(goCtx_->steps.steps() - 1, gn->outputVar());
    auto* loop = Loop::make(qctx, loopDep, loopBody, condition);

    auto* root = lastStep(loop, loopBody == getDst ? nullptr : loopBody);
    SubPlan subPlan;
    subPlan.root = root;
    subPlan.tail = startVidPlan.tail == nullptr ? loop : startVidPlan.tail;

    return subPlan;
}

The sub-execution plan generated by the Go statement is as follows:

Start -> GetNeighbors -> Project -> Dedup -> Loop -> GetNeighbors -> Project -> GetVertices -> Project -> LeftJoin -> Filter -> Project

The function of the Go statement is to complete the expansion of the graph. GetNeighbors is the most important node in the execution plan. The GetNeighbors operator will access the storage service during the runtime, and get the id of the end point after one step expansion through the start point and the specified edge type, and multi-step expansion passes Loop node is implemented. Between Start and Loop is the Loop sub-plan. When the conditions are met, the Loop sub-plan will be executed cyclically, and the last step of expanding the node is implemented outside the Loop. The Project node is used to obtain the end point id of the current expansion, and the Dedup node deduplicates the end point id as the starting point for the next expansion. The GetVertices node is responsible for fetching the attributes of the end tag, Filter is used for conditional filtering, and the function of LeftJoin is to merge the results of GetNeightbors and GetVertices.

The function of the OrderBy statement is to sort the data, and the sub-execution plan will generate Sort nodes.

After the left and right clause plans are generated, the PipeValidator::toPlan() function will call Validator::appendPlan() to connect the left and right sub-plans and get the final execution plan. The complete execution plan is as follows:

Start -> GetNeighbors -> Project -> Dedup -> Loop -> GetNeighbors -> Project -> GetVertices -> Project -> LeftJoin -> Filter -> Project -> Sort -> DataCollect

The above Validator part is introduced.

Forum related questions

Question: How to find the parser/GraphParser.hpp file

Answer: The .h file is a file generated during compilation, and there is a file once compiled.

The above is the introduction of this article.

Exchange graph database technology? Please join Nebula exchange group under Nebula fill in your card , Nebula assistant will pull you into the group ~

Overall structure

Directory Structure

Source code analysis

For example

Make clause validator

Clause verification

plan generation

Forum related questions

NebulaGraph

引用和评论

来领《黑神话：悟空》！NebulaGraph 用户案例征集ing

neo4j迁移到dozerdb