This article was first published at Nebula Graph Community public number
In the last article, we talked about the content of the Query Engine Optimizer part. In this article, we explain the remaining Scheduler and Executor parts of the Query Engine.
Overview
In the execution phase, the execution engine converts the physical execution plan generated by the Planner into a series of Executors through the Scheduler to drive the execution of the Executor.
Executor, the executor, each PlanNode in the physical execution plan corresponds to an Executor.
Source code positioning
The source code of the scheduler is in the src/scheduler
directory:
src/scheduler
├── AsyncMsgNotifyBasedScheduler.cpp
├── AsyncMsgNotifyBasedScheduler.h
├── CMakeLists.txt
├── Scheduler.cpp
└── Scheduler.h
The Scheduler abstract class defines the public interface of the scheduler, which can be inherited to implement multiple schedulers.
The AsyncMsgNotifyBasedScheduler scheduler is currently implemented, which is based on asynchronous message communication and breadth-first search to avoid stack overflow.
The source code of the actuator is in the src/executor
directory:
src/executor
├── admin
├── algo
├── CMakeLists.txt
├── ExecutionError.h
├── Executor.cpp
├── Executor.h
├── logic
├── maintain
├── mutate
├── query
├── StorageAccessExecutor.cpp
├── StorageAccessExecutor.h
└── test
Implementation process
First, the scheduler starts from the root node of the execution plan by using the breadth-first search algorithm to traverse the entire execution plan and builds their message notification mechanism according to the execution dependencies between nodes.
During execution, each node will be scheduled for execution after receiving the message that all of its dependent nodes have completed execution. Once the execution of itself is completed, it will send a message to the node that depends on itself until the execution of the entire plan is completed.
void AsyncMsgNotifyBasedScheduler::runExecutor(
std::vector<folly::Future<Status>>&& futures,
Executor* exe,
folly::Executor* runner,
std::vector<folly::Promise<Status>>&& promises) const {
folly::collect(futures).via(runner).thenTry(
[exe, pros = std::move(promises), this](auto&& t) mutable {
if (t.hasException()) {
return notifyError(pros, Status::Error(t.exception().what()));
}
auto status = std::move(t).value();
auto depStatus = checkStatus(std::move(status));
if (!depStatus.ok()) {
return notifyError(pros, depStatus);
}
// Execute in current thread.
std::move(execute(exe)).thenTry(
[pros = std::move(pros), this](auto&& exeTry) mutable {
if (exeTry.hasException()) {
return notifyError(pros, Status::Error(exeTry.exception().what()));
}
auto exeStatus = std::move(exeTry).value();
if (!exeStatus.ok()) {
return notifyError(pros, exeStatus);
}
return notifyOK(pros);
});
});
}
Each Executor will go through four stages of create-open-execute-close:
create
Generate the corresponding Executor according to the node type.
open
Do some initialization operations before the official execution of Executor, as well as the judgment of slow query termination and memory water level.
Nebula supports the manual kill
of a query statement at 0619c98887ef84. Therefore, the current execution plan status needs to be checked before each Executor is executed. If it is marked as killed
, the execution will be terminated.
Before executing each Query type Executor, it is also necessary to check whether the memory occupied by the current system reaches the memory water level. If the memory level is reached, the execution is terminated, which can avoid OOM to a certain extent.
Status Executor::open() {
if (qctx_->isKilled()) {
VLOG(1) << "Execution is being killed. session: " << qctx()->rctx()->session()->id()
<< "ep: " << qctx()->plan()->id()
<< "query: " << qctx()->rctx()->query();
return Status::Error("Execution had been killed");
}
auto status = MemInfo::make();
NG_RETURN_IF_ERROR(status);
auto mem = std::move(status).value();
if (node_->isQueryNode() && mem->hitsHighWatermark(FLAGS_system_memory_high_watermark_ratio)) {
return Status::Error(
"Used memory(%ldKB) hits the high watermark(%lf) of total system memory(%ldKB).",
mem->usedInKB(),
FLAGS_system_memory_high_watermark_ratio,
mem->totalInKB());
}
numRows_ = 0;
execTime_ = 0;
totalDuration_.reset();
return Status::OK();
}
execute
The input and output of the Query type Executor are both a table (DataSet).
Executor's execution is based on the iterator model: each time the calculation is performed, the next()
method of the iterator of the input table is called to obtain a row of data and perform calculations until the input table is traversed.
The result of the calculation forms a new table, which is output to the subsequent Executor as output.
folly::Future<Status> ProjectExecutor::execute() {
SCOPED_TIMER(&execTime_);
auto* project = asNode<Project>(node());
auto columns = project->columns()->columns();
auto iter = ectx_->getResult(project->inputVar()).iter();
DCHECK(!!iter);
QueryExpressionContext ctx(ectx_);
VLOG(1) << "input: " << project->inputVar();
DataSet ds;
ds.colNames = project->colNames();
ds.rows.reserve(iter->size());
for (; iter->valid(); iter->next()) {
Row row;
for (auto& col : columns) {
Value val = col->expr()->eval(ctx(iter.get()));
row.values.emplace_back(std::move(val));
}
ds.rows.emplace_back(std::move(row));
}
VLOG(1) << node()->outputVar() << ":" << ds;
return finish(ResultBuilder().value(Value(std::move(ds))).finish());
}
If the input table of the current Executor will not be used as input by other Executors, the memory used by these input tables will be dropped during the execution phase to reduce the memory usage.
void Executor::drop() {
for (const auto &inputVar : node()->inputVars()) {
if (inputVar != nullptr) {
// Make sure use the variable happened-before decrement count
if (inputVar->userCount.fetch_sub(1, std::memory_order_release) == 1) {
// Make sure drop happened-after count decrement
CHECK_EQ(inputVar->userCount.load(std::memory_order_acquire), 0);
ectx_->dropResult(inputVar->name);
VLOG(1) << "Drop variable " << node()->outputVar();
}
}
}
}
close
After Executor is executed, some execution information collected, such as execution time, number of rows in the output table, etc., are added to profiling stats.
The user can view these statistics in the execution plan displayed after a statement in the profile.
Execution Plan (optimize time 141 us)
-----+------------------+--------------+-----------------------------------------------------+--------------------------------------
| id | name | dependencies | profiling data | operator info |
-----+------------------+--------------+-----------------------------------------------------+--------------------------------------
| 2 | Project | 3 | ver: 0, rows: 56, execTime: 147us, totalTime: 160us | outputVar: [ |
| | | | | { |
| | | | | "colNames": [ |
| | | | | "VertexID", |
| | | | | "player.age" |
| | | | | ], |
| | | | | "name": "__Project_2", |
| | | | | "type": "DATASET" |
| | | | | } |
| | | | | ] |
| | | | | inputVar: __TagIndexFullScan_1 |
| | | | | columns: [ |
| | | | | "$-.VertexID AS VertexID", |
| | | | | "player.age" |
| | | | | ] |
-----+------------------+--------------+-----------------------------------------------------+--------------------------------------
| 3 | TagIndexFullScan | 0 | ver: 0, rows: 56, execTime: 0us, totalTime: 6863us | outputVar: [ |
| | | | | { |
| | | | | "colNames": [ |
| | | | | "VertexID", |
| | | | | "player.age" |
| | | | | ], |
| | | | | "name": "__TagIndexFullScan_1", |
| | | | | "type": "DATASET" |
| | | | | } |
| | | | | ] |
| | | | | inputVar: |
| | | | | space: 318 |
| | | | | dedup: false |
| | | | | limit: 9223372036854775807 |
| | | | | filter: |
| | | | | orderBy: [] |
| | | | | schemaId: 319 |
| | | | | isEdge: false |
| | | | | returnCols: [ |
| | | | | "_vid", |
| | | | | "age" |
| | | | | ] |
| | | | | indexCtx: [ |
| | | | | { |
| | | | | "columnHints": [], |
| | | | | "index_id": 325, |
| | | | | "filter": "" |
| | | | | } |
| | | | | ] |
-----+------------------+--------------+-----------------------------------------------------+--------------------------------------
| 0 | Start | | ver: 0, rows: 0, execTime: 1us, totalTime: 19us | outputVar: [ |
| | | | | { |
| | | | | "colNames": [], |
| | | | | "type": "DATASET", |
| | | | | "name": "__Start_0" |
| | | | | } |
| | | | | ] |
-----+------------------+--------------+-----------------------------------------------------+--------------------------------------
Above, the modules related to the source code parsing Query Engine have been explained, and some features will be explained later.
Exchange graph database technology? Please join Nebula exchange group under Nebula fill in your card , Nebula assistant will pull you into the group ~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。