Editor's note: This article introduces in detail how Milvus 2.0 completes attribute filtering through query expressions, query syntax generation, and query operation execution.
Outline sharing:
- Grammar rules for query expressions
- Generation of query syntax
- Interpretation and execution of syntax trees
Grammar rules for query expressions
Query expressions supported by Milvus
As shown in the figure below, Milvus uses EBNF syntax, where equations and syntax diagrams are used to reflect the overall rules of query expressions supported by Milvus.
Expression LogicalExpr can be represented by four combinations, such as binary logical operators, unary logical operators before logical expressions, or simple Single Expr.
Since EBNF itself is a recursive structure, LogicalExpr can be either the whole of the four combined, or a single node among them, and can continue to be nested. That is, the expression rules supported by Milvus can be infinitely nested recursively. If there are many attributes that need to be filtered, different combinations and nesting can be used to express the required filtering conditions.
Low-level operation services and specific expressions
The above figure shows several expressions mentioned above. First of all, you can add the logical operator of the unit in front of the expression. Currently, Milvus supports adding "not", which means to take the negation of the expression after the calculation is made. Secondly, binary logical operators are two different expressions of and and or. Then Single Expr currently implements Term and Compare.
In addition, other operations such as basic addition, subtraction, multiplication and division are also supported. The figure below shows the priority of the operation service, decreasing from 1 - 9.
Generation of query syntax
Introduction to the open source tool ANTLR
ANTLR can be understood as a parser or generator, which can read structured text or binary files, including the process of execution and translation. Specifically, ANTLR can parse according to the defined grammar rules, and can also generate a parser to construct the parsing number; at the same time, it also provides some APIs of WALKER, which can help to traverse the parsing number. For example, in the expression "SP =100;" in the figure, the language recognizer LEXER that comes with ANTLR will generate four tokens, and then parse them to generate Parse-Tree.
One of the more important functions is to provide the WALKER mechanism for the generated Parse-Tree, and to traverse the parsing number through the WALKER. For example, whether each node complies with grammar rules and whether words involve sensitive words can be checked for legitimacy. From the API of Parse-Tree traversal listed on the right, it can be seen that ANTLR traverses from the root node to the last child node in a deep traversal order, so there is no need to manually distinguish multi-fork trees. The pre-order, mid-order, and post-order can be directly viewed in the API.
PlanAST generation
The operation method of Milvus is similar to that of ANTLR, but the latter is relatively primitive and needs to redefine relatively complex grammar rules according to requirements. Milvus uses the same common grammar rules as expression, and relies on the open source tool ant-expr on GitHub to query and parse the generated grammar.
First, the user will pass an expression expression, and then through the Parse method of ant-parser (this is included in ant-expr), a more primitive Unsolved Plantree can be generated. It is to generate a simple binary tree after the four major analysis and simple Parse mentioned above. This binary tree is represented by some internal structures of ant-expr. Next, do some optimizer for this Plantree, this optimizer is realized by Milvus himself. Similar to the mechanism of WALKER described earlier, it traverses and implements some optimizers for each node. Since the optimization tree function generated by ant-expr itself is already good, it is friendly to subsequent execution and analysis, and the optimizer work here is relatively simple.
The last analyzer process of the dotted line node is to perform recursive traversal analysis on the optimized Plantree. In the process of traversing the binary tree, each node corresponds to the structure of the defined protobuf syntax tree, and then generates a plan AST (abstract syntax tree) of a protobuf structure.
Interpretation and execution of syntax trees
PlanAST & Expr definition
A proto structure is defined in Milvus to represent the plan AST abstract syntax tree mentioned above. The message of a protobuf structure defined in the upper right corner of the figure, the query method is obtained through expression, and there are six options for Expr, among which BinaryExpr and UnaryExpr have further recursive LogicalExpr.
The above picture is a UML diagram of an expression, which is a structure diagram of the inheritance relationship of classes implemented according to the proto structure in C++, including the base and derived classes of each Expr. Each class implements an accept method that accepts the parameters of the visitor. This is a typical Visitor design pattern, in which the execution of the traversal of the query syntax tree generated earlier. The advantage of this mode is that the user does not need to operate the original Expr, and can directly modify some of the specific classes and elements through the access method.
PlanAST execution
The above diagram summarizes the workflow of query syntax tree execution. First, a PlanNode of type proto is received from C++, and a PlanNode of type segcore is obtained through ProtoParse inside C++. On this basis, accept a series of visitor classes through the accept method, and then modify and execute the internal structure of PlanNode. Finally, recursively traverse each specific ExecPlanNode to obtain the filtered result Filtered_result. The Bitmap in the following figure is the specific form.
For the full video explanation, please click: https://www.bilibili.com/video/BV1h44y1v7S8/
If you have any improvements or suggestions for Milvus in the process of using, welcome to keep in touch with us on GitHub or various official channels~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。