Nebula Graph source code interpretation series｜ Vol.04 Implementation of Optimizer based on RBO

Nebula Graph 源码解读系列｜ Vol.04 基于 RBO 的 Optimizer 实现

In the previous article, we described how an execution plan is generated. This time we look at the execution plan generated by the Optimizer.

Overview

Optimizer, optimizer, as the name implies, is a component used to optimize the execution plan. Database optimizers are usually divided into two categories, one is rule-based optimizer RBO (Rule-basd optimizer), the other is cost-based optimization CBO (Cost-based optimizer), the former is completely based on preset optimization rules Optimization, matching conditions and optimization results are relatively fixed; the latter will calculate the execution cost of different execution plans based on the collected data statistics, and try to choose the least costly execution plan.

Currently Nebula Graph is mainly implemented as RBO, so this article also focuses on the implementation of RBO in Nebula Graph.

Source code positioning

The source code implementation of the optimizer is under the src/optimizer directory, and the file structure is as follows:

.
├── CMakeLists.txt
├── OptContext.cpp
├── OptContext.h
├── OptGroup.cpp
├── OptGroup.h
├── Optimizer.cpp
├── Optimizer.h
├── OptimizerUtils.cpp
├── OptimizerUtils.h
├── OptRule.cpp
├── OptRule.h
├── rule
│   ├── CombineFilterRule.cpp
│   ├── CombineFilterRule.h
│   ├── EdgeIndexFullScanRule.cpp
│   ├── EdgeIndexFullScanRule.h
|    ....
|
└── test
    ├── CMakeLists.txt
    ├── IndexBoundValueTest.cpp
    └── IndexScanRuleTest.cpp

The test directory is the test, the rule directory is the preset rule set, and the other source files are the specific implementation of the optimizer.

The entry for the optimizer to optimize the query is in the src/service/QueryInstance.cpp file, as shown below:

Status QueryInstance::validateAndOptimize() {
    auto *rctx = qctx()->rctx();
    VLOG(1) << "Parsing query: " << rctx->query();
    auto result = GQLParser(qctx()).parse(rctx->query());
    NG_RETURN_IF_ERROR(result);
    sentence_ = std::move(result).value();

    NG_RETURN_IF_ERROR(Validator::validate(sentence_.get(), qctx()));
    NG_RETURN_IF_ERROR(findBestPlan());

    return Status::OK();
}

findBestPlan function will call the optimizer, optimize and return a new optimized execution plan.

Brief description of the optimization process

The execution plan currently designed by Nebula Graph is a directed acyclic graph from a topological point of view. It is organized by each node pointing to its dependent nodes. In theory, each node can specify the result of any node as input, but use The result of a node that has not been executed is meaningless, so when the execution plan is generated, restricted to only use nodes that have been executed as input . At the same time, the execution plan also executes special nodes such as loops and conditional branches. As shown in Figure 1, the execution plan is generated GO 2 STEPS FROM 'Tim Duncan' OVER like

Nebula Graph 源码解读系列｜基于 RBO 的 Optimizer 实现

figure 1

The current main function of the optimizer is to match the execution plan according to the preset mode. If the match is successful, the corresponding conversion function is called to convert the matched part of the execution plan according to the preset rules. For example, the execution plan of the form GetNeighbor → Limit GetNeighbor operator of limit pushdown to realize the optimization of the operator pushdown.

Implementation

First of all, the optimizer does not directly operate on the execution plan, but first converts the execution plan into OptGroup and OptGroupNode . OptGroup represents a single optimization group (usually refers to one or more equivalent operators), OptGroupNode represents an independent operator, and there are pointers to dependencies and branches, which means that OptGroupNode retains execution The planned topology is . The main reason for this structural transformation is to abstract the topological structure of the execution plan, to shield some unnecessary execution details (such as loops and conditional branches), and to save the context of some rule matching in the new structure.

The conversion process is basically a simple preorder traversal of , and the operator is converted into the corresponding OptGroup and OptGroupNode during the traversal process. For the convenience of description, OptGroup and OptGroupNode is called an optimized plan, which is distinguished from an execution plan.

After the conversion is completed, the matching rules and the corresponding optimization plan conversion will be started. Here, all predefined rules will be traversed, and each rule will do a Bottom-Up traversal match on the optimization plan. Specifically, it starts from the most leaf layer OptGroup and continues to the root node OptGroup , in each OptGroup On the node, do a Top-Down traversal OptGroupNode

As shown in Figure 2, the pattern to be matched here is Limit -> Project → GetNeighbors . In the order of Bottom-Up, first Start node in the order of Top-Down at Start not equal to Limit match fails, and then starts from GetNeighbors The Top-Down match failed, and the match was not successful Limit After the matching is successful, the part of the matched optimization plan will be converted transform Limit and GetNeighbors .

Nebula Graph 源码解读系列｜基于 RBO 的 Optimizer 实现

figure 2

Finally, the optimizer will re-convert the optimized optimization plan that has been optimized into an execution plan. This is the opposite of the first step, but it is also a recursive traversal conversion .

How to add a new rule

In the previous article, we learned about the implementation of the entire optimizer component, but for adding optimization rules, you don't need to know too much about the implementation details of the optimizer, you only need to understand how to define new rules. Here, we take Limit pushdown as an example to explain the realization of a typical optimization rule. Limit pushdown rule is detailed in the src/optimizer/rule/LimitPushDownRule.cpp file:


std::unique_ptr<OptRule> LimitPushDownRule::kInstance =
    std::unique_ptr<LimitPushDownRule>(new LimitPushDownRule());

LimitPushDownRule::LimitPushDownRule() {
    RuleSet::QueryRules().addRule(this);
}

const Pattern &LimitPushDownRule::pattern() const {
    static Pattern pattern =
        Pattern::create(graph::PlanNode::Kind::kLimit,
                        {Pattern::create(graph::PlanNode::Kind::kProject,
                                         {Pattern::create(graph::PlanNode::Kind::kGetNeighbors)})});
    return pattern;
}

StatusOr<OptRule::TransformResult> LimitPushDownRule::transform(
    OptContext *octx,
    const MatchedResult &matched) const {
    auto limitGroupNode = matched.node;
    auto projGroupNode = matched.dependencies.front().node;
    auto gnGroupNode = matched.dependencies.front().dependencies.front().node;

    const auto limit = static_cast<const Limit *>(limitGroupNode->node());
    const auto proj = static_cast<const Project *>(projGroupNode->node());
    const auto gn = static_cast<const GetNeighbors *>(gnGroupNode->node());

    int64_t limitRows = limit->offset() + limit->count();
    if (gn->limit() >= 0 && limitRows >= gn->limit()) {
        return TransformResult::noTransform();
    }

    auto newLimit = static_cast<Limit *>(limit->clone());
    auto newLimitGroupNode = OptGroupNode::create(octx, newLimit, limitGroupNode->group());

    auto newProj = static_cast<Project *>(proj->clone());
    auto newProjGroup = OptGroup::create(octx);
    auto newProjGroupNode = newProjGroup->makeGroupNode(newProj);

    auto newGn = static_cast<GetNeighbors *>(gn->clone());
    newGn->setLimit(limitRows);
    auto newGnGroup = OptGroup::create(octx);
    auto newGnGroupNode = newGnGroup->makeGroupNode(newGn);

    newLimitGroupNode->dependsOn(newProjGroup);
    newProjGroupNode->dependsOn(newGnGroup);
    for (auto dep : gnGroupNode->dependencies()) {
        newGnGroupNode->dependsOn(dep);
    }

    TransformResult result;
    result.eraseAll = true;
    result.newGroupNodes.emplace_back(newLimitGroupNode);
    return result;
}

std::string LimitPushDownRule::toString() const {
    return "LimitPushDownRule";
}

To define a rule, first inherit the OptRule class. Then, implement the pattern interface, where it is required to return the pattern that needs to be matched. The pattern is the dependent composition of the operator and the operator, such as Limit -> Project -> GetNeighbors . Then you need to implement the transform interface. The transform interface will pass in a matching optimization plan. We analyze the matched optimization plan according to the predefined mode, and perform the corresponding optimization transformation on the optimization plan, such as combining the Limit operator into the GetNeighbors calculation Then, return to the optimized optimization plan at the end.

Only need to correctly implement these two interfaces, our new optimization rules can work normally.

Above is the introduction of Nebula Graph Optimizer.

Exchange graph database technology? Please join Nebula exchange group under Nebula fill in your card , Nebula assistant will pull you into the group ~

[Activity] Nebula Hackathon 2021 is underway, explore the unknown together and receive ¥ 150,000 bonus →→ 1619b15cc09ffb https://nebula-graph.com.cn/hackathon/

Overview

Source code positioning

Brief description of the optimization process

Implementation

How to add a new rule

NebulaGraph

引用和评论

来领《黑神话：悟空》！NebulaGraph 用户案例征集ing

从零构建知识图谱：使用大语言模型处理复杂数据的11步实践指南

neo4j迁移到dozerdb