In the previous article, we described how an execution plan is generated. This time we look at the execution plan generated by the Optimizer.
Overview
Optimizer, optimizer, as the name implies, is a component used to optimize the execution plan. Database optimizers are usually divided into two categories, one is rule-based optimizer RBO (Rule-basd optimizer), the other is cost-based optimization CBO (Cost-based optimizer), the former is completely based on preset optimization rules Optimization, matching conditions and optimization results are relatively fixed; the latter will calculate the execution cost of different execution plans based on the collected data statistics, and try to choose the least costly execution plan.
Currently Nebula Graph is mainly implemented as RBO, so this article also focuses on the implementation of RBO in Nebula Graph.
Source code positioning
The source code implementation of the optimizer is under the src/optimizer
directory, and the file structure is as follows:
.
├── CMakeLists.txt
├── OptContext.cpp
├── OptContext.h
├── OptGroup.cpp
├── OptGroup.h
├── Optimizer.cpp
├── Optimizer.h
├── OptimizerUtils.cpp
├── OptimizerUtils.h
├── OptRule.cpp
├── OptRule.h
├── rule
│ ├── CombineFilterRule.cpp
│ ├── CombineFilterRule.h
│ ├── EdgeIndexFullScanRule.cpp
│ ├── EdgeIndexFullScanRule.h
| ....
|
└── test
├── CMakeLists.txt
├── IndexBoundValueTest.cpp
└── IndexScanRuleTest.cpp
The test
directory is the test, the rule
directory is the preset rule set, and the other source files are the specific implementation of the optimizer.
The entry for the optimizer to optimize the query is in the src/service/QueryInstance.cpp
file, as shown below:
Status QueryInstance::validateAndOptimize() {
auto *rctx = qctx()->rctx();
VLOG(1) << "Parsing query: " << rctx->query();
auto result = GQLParser(qctx()).parse(rctx->query());
NG_RETURN_IF_ERROR(result);
sentence_ = std::move(result).value();
NG_RETURN_IF_ERROR(Validator::validate(sentence_.get(), qctx()));
NG_RETURN_IF_ERROR(findBestPlan());
return Status::OK();
}
findBestPlan
function will call the optimizer, optimize and return a new optimized execution plan.
Brief description of the optimization process
The execution plan currently designed by Nebula Graph is a directed acyclic graph from a topological point of view. It is organized by each node pointing to its dependent nodes. In theory, each node can specify the result of any node as input, but use The result of a node that has not been executed is meaningless, so when the execution plan is generated, restricted to only use nodes that have been executed as input . At the same time, the execution plan also executes special nodes such as loops and conditional branches. As shown in Figure 1, the execution plan is generated GO 2 STEPS FROM 'Tim Duncan' OVER like
figure 1
The current main function of the optimizer is to match the execution plan according to the preset mode. If the match is successful, the corresponding conversion function is called to convert the matched part of the execution plan according to the preset rules. For example, the execution plan of the form GetNeighbor
→ Limit
GetNeighbor
operator of limit pushdown to realize the optimization of the operator pushdown.
Implementation
First of all, the optimizer does not directly operate on the execution plan, but first converts the execution plan into OptGroup
and OptGroupNode
. OptGroup
represents a single optimization group (usually refers to one or more equivalent operators), OptGroupNode
represents an independent operator, and there are pointers to dependencies and branches, which means that OptGroupNode
retains execution The planned topology is . The main reason for this structural transformation is to abstract the topological structure of the execution plan, to shield some unnecessary execution details (such as loops and conditional branches), and to save the context of some rule matching in the new structure.
The conversion process is basically a simple preorder traversal of , and the operator is converted into the corresponding OptGroup
and OptGroupNode
during the traversal process. For the convenience of description, OptGroup
and OptGroupNode
is called an optimized plan, which is distinguished from an execution plan.
After the conversion is completed, the matching rules and the corresponding optimization plan conversion will be started. Here, all predefined rules will be traversed, and each rule will do a Bottom-Up traversal match on the optimization plan. Specifically, it starts from the most leaf layer OptGroup
and continues to the root node OptGroup
, in each OptGroup
On the node, do a Top-Down traversal OptGroupNode
As shown in Figure 2, the pattern to be matched here is Limit
-> Project
→ GetNeighbors
. In the order of Bottom-Up, first Start
node in the order of Top-Down at Start
not equal to Limit
match fails, and then starts from GetNeighbors
The Top-Down match failed, and the match was not successful Limit
After the matching is successful, the part of the matched optimization plan will be converted transform
Limit
and GetNeighbors
.
figure 2
Finally, the optimizer will re-convert the optimized optimization plan that has been optimized into an execution plan. This is the opposite of the first step, but it is also a recursive traversal conversion .
How to add a new rule
In the previous article, we learned about the implementation of the entire optimizer component, but for adding optimization rules, you don't need to know too much about the implementation details of the optimizer, you only need to understand how to define new rules. Here, we take Limit
pushdown as an example to explain the realization of a typical optimization rule. Limit
pushdown rule is detailed in the src/optimizer/rule/LimitPushDownRule.cpp
file:
std::unique_ptr<OptRule> LimitPushDownRule::kInstance =
std::unique_ptr<LimitPushDownRule>(new LimitPushDownRule());
LimitPushDownRule::LimitPushDownRule() {
RuleSet::QueryRules().addRule(this);
}
const Pattern &LimitPushDownRule::pattern() const {
static Pattern pattern =
Pattern::create(graph::PlanNode::Kind::kLimit,
{Pattern::create(graph::PlanNode::Kind::kProject,
{Pattern::create(graph::PlanNode::Kind::kGetNeighbors)})});
return pattern;
}
StatusOr<OptRule::TransformResult> LimitPushDownRule::transform(
OptContext *octx,
const MatchedResult &matched) const {
auto limitGroupNode = matched.node;
auto projGroupNode = matched.dependencies.front().node;
auto gnGroupNode = matched.dependencies.front().dependencies.front().node;
const auto limit = static_cast<const Limit *>(limitGroupNode->node());
const auto proj = static_cast<const Project *>(projGroupNode->node());
const auto gn = static_cast<const GetNeighbors *>(gnGroupNode->node());
int64_t limitRows = limit->offset() + limit->count();
if (gn->limit() >= 0 && limitRows >= gn->limit()) {
return TransformResult::noTransform();
}
auto newLimit = static_cast<Limit *>(limit->clone());
auto newLimitGroupNode = OptGroupNode::create(octx, newLimit, limitGroupNode->group());
auto newProj = static_cast<Project *>(proj->clone());
auto newProjGroup = OptGroup::create(octx);
auto newProjGroupNode = newProjGroup->makeGroupNode(newProj);
auto newGn = static_cast<GetNeighbors *>(gn->clone());
newGn->setLimit(limitRows);
auto newGnGroup = OptGroup::create(octx);
auto newGnGroupNode = newGnGroup->makeGroupNode(newGn);
newLimitGroupNode->dependsOn(newProjGroup);
newProjGroupNode->dependsOn(newGnGroup);
for (auto dep : gnGroupNode->dependencies()) {
newGnGroupNode->dependsOn(dep);
}
TransformResult result;
result.eraseAll = true;
result.newGroupNodes.emplace_back(newLimitGroupNode);
return result;
}
std::string LimitPushDownRule::toString() const {
return "LimitPushDownRule";
}
To define a rule, first inherit the OptRule
class. Then, implement the pattern
interface, where it is required to return the pattern that needs to be matched. The pattern is the dependent composition of the operator and the operator, such as Limit
-> Project
-> GetNeighbors
. Then you need to implement the transform
interface. The transform
interface will pass in a matching optimization plan. We analyze the matched optimization plan according to the predefined mode, and perform the corresponding optimization transformation on the optimization plan, such as combining the Limit
operator into the GetNeighbors
calculation Then, return to the optimized optimization plan at the end.
Only need to correctly implement these two interfaces, our new optimization rules can work normally.
Above is the introduction of Nebula Graph Optimizer.
Exchange graph database technology? Please join Nebula exchange group under Nebula fill in your card , Nebula assistant will pull you into the group ~
[Activity] Nebula Hackathon 2021 is underway, explore the unknown together and receive ¥ 150,000 bonus →→ 1619b15cc09ffb https://nebula-graph.com.cn/hackathon/
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。