vivo Internet Server Team - Shuai Guangying
Exploring the Presto SQL Engine Series: The first article " Exploring the Presto SQL Engine (1) - Using Antlr Skillfully " introduces the basic usage of Antlr and how to use Antlr4 to parse SQL to query CSV data. In the second article, " Exploring the Presto SQL Engine (2) )-Analysis of Join " combines the principles of Join, as well as the principles of Join, and the ideas in Presto.
This article is the third in the series. It introduces the parsing principle of where condition based on Antlr, and compares the performance of the two implementation ideas of direct parsing and code generation. After experiments, the implementation based on code generation has 3 times the performance improvement of direct parsing. .
1. Background issues
In the process of business development, using SQL for data filtering (where keyword) and association (join keyword) is the most common and basic ability to write SQL statements to meet business requirements.
Under the dual pressure of massive data and response time, the seemingly simple data screening and association face many technical details in the implementation process. In the process of researching and solving these problems, very interesting data structures and optimization ideas have also been born. Such as B-tree, LSM tree, columnar storage, dynamic code generation, etc.
For the Presto SQL engine, the judgment of Boolean expressions is a very basic ability to implement where and join processing logic.
This article aims to explore the implementation ideas of the where keyword, the basic ideas of the internal implementation of the where statement and the basic ideas of performance optimization. Take the where statement as an example: where filtering supports three basic logics of and, or, and not. On the basis of the three basic logics, it supports custom priorities based on parentheses, fields supported within expressions, and function calls. It looks simple, but it's actually quite different. It is worth digging deeper to learn.
2. Use Antlr to implement where conditional filtering
For the Presto query engine, its overall architecture is as follows:
Among them, Parser&Analyzer is where Antlr comes in. Any SQL statement must go through the step of Parser&Analyzer. The background and basic operations of Antlr have been described in the article "Exploring the Application of Antlr in the Presto Engine" and will not be repeated here.
This article still uses the three-axes that drive Antlr to implement the SQL statement's support for where conditions.
For the where condition, first disassemble the simplest structure of the where condition:
and and or are the basic structures for combinatorial conditional filtering.
6 big comparison operators (greater than, less than, equal to, not equal to, greater than or equal to, less than or equal to) as basic expressions.
The next step is to use the standard process of Antlr.
2.1 Defining grammar rules
Use antlr to define the grammar rules as follows (the rules are tailored based on presto SQL grammar, the complete definition can refer to the presto SelectBase.g4 file):
querySpecification
: SELECT selectItem (',' selectItem)*
(FROM relation (',' relation)*)?
(WHERE where=booleanExpression)?
;
...
booleanExpression
: valueExpression predicate[$valueExpression.ctx]? #predicated
| NOT booleanExpression #logicalNot
| left=booleanExpression operator=AND right=booleanExpression #logicalBinary
| left=booleanExpression operator=OR right=booleanExpression #logicalBinary
;
predicate[ParserRuleContext value]
: comparisonOperator right=valueExpression #comparison
;
That is, the where condition is followed by a booleanExpression expression rule. The booleanExpression expression rule supports the basic valueExpression prediction, and, or, and not condition combination. The purpose of this article is to explore the core ideas, not to achieve a complete SQL filtering capability, so only the and and or conditions can be processed, so as to simplify the complexity and focus on the core issues.
2.2 Generate parsing code
Referring to Antlr's official documentation, use the preprocessed Antlr command to process the g4 file and generate the code.
antlr4 -package org.example.antlr -no-listener -visitor .\SqlBase.g4
2.3 Develop business code to process AST
2.3.1 Defining Syntax Tree Nodes
After understanding the composition of expressions, first define two basic SQL syntax tree nodes. The class diagram is as follows:
The two classes are structurally isomorphic: a branch expression on the left and right, and an operator in the middle.
2.3.2 Building a Syntax Tree
In the implementation of AstBuilder, the parsing implementation of logicalBinary and comparison related grammars has been added. These jobs are all painted in the same way, and there is no difficulty.
@Override
public Node visitComparison(Select1Parser.ComparisonContext context)
{
return new ComparisonExpression(
getLocation(context.comparisonOperator()),
getComparisonOperator(((TerminalNode) context.comparisonOperator().getChild(0)).getSymbol()),
(Expression) visit(context.value),
(Expression) visit(context.right));
}
@Override
public Node visitLogicalBinary(Select1Parser.LogicalBinaryContext context)
{
return new LogicalBinaryExpression(
getLocation(context.operator),
getLogicalBinaryOperator(context.operator),
(Expression) visit(context.left),
(Expression) visit(context.right));
}
Through the above two steps, an SQL expression can be converted into a SQL syntax tree.
2.3.3 Traversing the syntax tree
With the SQL syntax tree, the problem naturally emerges:
a) What is the use of this SQL syntax tree structure?
b) How to use this SQL syntax tree structure?
In fact, for the application scenario of SQL syntax tree, it is also very common in our daily development to exclude the logic inside the SQL engine. For example: SQL statement formatting, SQL spelling check.
How to use the SQL syntax tree can be explained by a simple example: SQL statement formatting.
In the article "Exploring the Application of Antlr in Presto Engine", in order to simplify the problem, the AST generated by antlr is directly disassembled to obtain the table name and field name in the SQL statement. The processing method is very simple and rude. In fact, there is a more elegant processing idea in presto: AstVisitor. That is, the visitor pattern in the design pattern.
The visitor pattern is defined as follows:
It encapsulates some operations that act on elements in a certain data structure. It can define new operations that act on these elements without changing the data structure.
The implementation of this definition into the SQL syntax tree structure is as follows: that is, the SQL syntax tree node defines an accept method as the entry for node operations (refer to the Node.accept() method). Define an AstVisitor class to standardize the operation of accessing the node tree, and the specific implementation class can inherit AstVisitor. Once the infrastructure is defined, the rest is the same.
The core framework code of the two classes is as follows:
public abstract class Node
{
/**
* Accessible for {@link AstVisitor}, use {@link AstVisitor#process(Node, Object)} instead.
*/
protected <R, C> R accept(AstVisitor<R, C> visitor, C context)
{
return visitor.visitNode(this, context);
}
}
public abstract class AstVisitor<R, C>
{
protected R visitStatement(Statement node, C context)
{
return visitNode(node, context);
}
protected R visitQuery(Query node, C context)
{
return visitStatement(node, context);
}
....
}
For example, for the most common SQL syntax such as select * from table where, the core structure of the query defined in the SelectBase.g4 file is as follows:
querySpecification
: SELECT setQuantifier? selectItem (',' selectItem)*
(FROM relation (',' relation)*)?
(WHERE where=booleanExpression)?
(GROUP BY groupBy)?
(HAVING having=booleanExpression)?
;
Taking formatting SQL statements as an example, Presto implements two implementation classes, SqlFormatter and ExpressionFormatter. The code to format this statement is as follows:
@Override
protected Void visitQuerySpecification(QuerySpecification node, Integer indent)
{
process(node.getSelect(), indent);
if (node.getFrom().isPresent()) {
append(indent, "FROM");
builder.append('\n');
append(indent, " ");
process(node.getFrom().get(), indent);
}
builder.append('\n');
if (node.getWhere().isPresent()) {
append(indent, "WHERE " + formatExpression(node.getWhere().get(), parameters))
.append('\n');
}
if (node.getGroupBy().isPresent()) {
append(indent, "GROUP BY " + (node.getGroupBy().get().isDistinct() ? " DISTINCT " : "") + formatGroupBy(node.getGroupBy().get().getGroupingElements())).append('\n');
}
if (node.getHaving().isPresent()) {
append(indent, "HAVING " + formatExpression(node.getHaving().get(), parameters))
.append('\n');
}
if (node.getOrderBy().isPresent()) {
process(node.getOrderBy().get(), indent);
}
if (node.getLimit().isPresent()) {
append(indent, "LIMIT " + node.getLimit().get())
.append('\n');
}
return null;
}
The code implementation logic is clear and readable.
Similarly, the core of implementing where conditional parsing lies in the processing of comparison conditional expressions (visitComparisonExpression) and the processing of logical conditional expressions (visitLogicalBinaryExpression). Also for the purpose of focusing on the core process, we only implement filtering on integer fields like a > 0 or b < 10.
For and and or structures, since they are tree structures, recursion is used, that is, the leaf nodes are prioritized and then aggregated up layer by layer. The processing logic is shown in the following code:
/**
* 处理比较表达式
* @param node
* @param context
* @return
*/
@Override
protected Void visitComparisonExpression(ComparisonExpression node, Map<String,Long> context) {
Expression left = node.getLeft();
Expression right = node.getRight();
String leftKey = ((Identifier) left).getValue();
Long rightKey = ((LongLiteral) right).getValue();
Long leftVal = context.get(leftKey);
if(leftVal == null){
stack.push(false);
}
ComparisonExpression.Operator op = node.getOperator();
switch (op){
case EQUAL:
stack.push(leftVal.equals(rightKey));break;
case LESS_THAN:
stack.push( leftVal < rightKey);;break;
case NOT_EQUAL:
stack.push( !leftVal.equals(rightKey));break;
case GREATER_THAN:
stack.push( leftVal>rightKey);break;
case LESS_THAN_OR_EQUAL:
stack.push( leftVal<=rightKey);break;
case GREATER_THAN_OR_EQUAL:
stack.push( leftVal>=rightKey);break;
case IS_DISTINCT_FROM:
default:
throw new UnsupportedOperationException("not supported");
}
return null;
}
The implementation here is very simple. Based on the result of the calculation of the leaf node (ComparisonExpression) stored on the stack, when recursively backtracking the non-leaf node (LogicalBinaryExpression), the value at the top of the stack is taken from the stack, and the operation of and and or is performed. To explain: In fact, the recursive implementation method is that you can directly return the value without using the stack. The stack-based implementation here is to be structurally consistent with the logic generated by the code below to facilitate comparison of performance.
2.4 Verify expression execution
In order to verify the execution result of the above scheme, a simple filtering rule is defined, and a random number is generated to verify whether the logic of the expression can be judged.
// antlr处理表达式语句,生成Expression对象
SqlParser sqlParser = new SqlParser();
Expression expression = sqlParser.createExpression("a>1 and b<2");
// 基于AstVisitor实现
WhereExpFilter rowFilter = new WhereExpFilter(expression);
Random r = new Random();
for(int i=0;i<10;i++){
Map<String,Long> row = new HashMap<>();
row.put("a", (long) r.nextInt(10));
row.put("b", (long) r.nextInt(10));
System.out.println("exp: a>1 and b<2, param:"+row+", ret:"+rowFilter.filter(row));
}
// ====== 执行结果如下
/**
exp: a>1 and b<2, param:{a=9, b=8}, ret:false
exp: a>1 and b<2, param:{a=7, b=3}, ret:false
exp: a>1 and b<2, param:{a=0, b=7}, ret:false
exp: a>1 and b<2, param:{a=6, b=0}, ret:true
exp: a>1 and b<2, param:{a=2, b=0}, ret:true
exp: a>1 and b<2, param:{a=9, b=0}, ret:true
exp: a>1 and b<2, param:{a=3, b=6}, ret:false
exp: a>1 and b<2, param:{a=8, b=7}, ret:false
exp: a>1 and b<2, param:{a=6, b=1}, ret:true
exp: a>1 and b<2, param:{a=4, b=6}, ret:false
*/
Through the above processing flow and the verification of the execution result, it can be determined that the filtering of where conditions can be implemented very simply based on Antlr, which is similar to the four computing capabilities implemented by antlr. However, after reading the Presto source code and related documents, I found that there is actually another way to implement conditional filtering and JOIN. Why is this?
Three, based on AstVisitor to directly parse the SQL condition problem
Before answering the implementation ideas of Presto, we need to lay down two basic knowledge. One is the CPU's pipelining and branch prediction, and the other is the JVM's method inlining optimization.
3.1 CPU Pipelining and Branch Prediction
The execution of CPU instructions in the computer composition principle is shown in the following figure:
That is to say, in the early days, the CPU executed instructions in a serial manner. In order to improve the throughput of the CPU, a quasi-parallel processing technology in which multiple instructions overlapped and operated in the RISC architecture was implemented by pipeline. From the above diagram, it can be seen that after adding a pipeline, the number of instructions executed per unit time is doubled, that is, the performance is doubled.
Of course, this is an ideal situation, but in reality there are two types of problems:
1) The execution of the next instruction depends on the result of the execution of the previous instruction.
2) When encountering a branch, you must wait for the conditional calculation to complete before knowing whether the branch is executed.
For problem 1, the out-of-order execution method can improve the performance by 20%~30%. For problem 2, it is dealt with by the method of branch prediction.
There are two interesting cases of using branch prediction principles to improve performance.
Case 1:
There is a famous question on stackoverflow: why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array. That is, for the traversal of ordered arrays and unordered arrays, the execution time is almost 2~3 times different.
On the author's computer, the results of running the case match the description. It should be noted that using system.nanotime() to measure, system.currenttimemillis() is not accurate enough.
Case 2:
In the Dubbo source code ChannelEventRunnable, the efficiency is improved by approximately 2 times by optimizing the switch code into if.
To briefly summarize, the branch logic in the code will affect the performance, and some optimization processing (such as data sorting/hot code prepending) can improve the success rate of branch prediction, thereby improving the efficiency of program execution.
3.2 JVM method inlining optimization
JVM is a stack-based instruction execution strategy. In addition to the overhead of executing its own logic, a function call also has the additional overhead of maintaining function execution context information, such as: stack frame generation, parameter field stacking, stack frame popping, and instruction execution address jumping. JVM inline optimization has a huge impact on performance.
Here is a small experiment, for the same piece of code with normal execution and with inline optimization disabled (-XX:CompileCommand=dontinline,
test/TestInline.addOp), the performance gap is almost 6 times.
The code sample and data are as follows:
public class TestInline {
public int addOp(int a,int b){
return a+b;
}
@Benchmark
public int testAdd(){
int sum=0;
for(int i=0;i<100000;i++){
sum=addOp(sum,i);
}
return sum;
}
public static void main(String[] args) throws RunnerException {
Options options = new OptionsBuilder()
.warmupIterations(2).measurementIterations(2)
.forks(1).build();
new Runner(options).run();
}
}
// 执行结果如下:
/**
Benchmark Mode Cnt Score Error Units
TestInline.testAdd thrpt 2 18588.318 ops/s(正常执行)
TestInline.testAdd thrpt 2 3131.466 ops/s(禁用内联)
**/
For the Java language, method inlining optimization also has a cost. Therefore, usually hot code/code with smaller method body/code decorated with private, static, and final may be inlined. Excessive method bodies and object-oriented inheritance and polymorphism can affect method inlining and thus performance.
For the most common where and join statements in the SQL execution engine, since the data type and operator type need to be judged during the execution process, the processing of almost every row of data affects the branch prediction of the CPU. All operators need to encapsulate independent processing logic. If you use the method of directly parsing SQL statements, it will have a great impact on branch prediction and method inlining. In order to improve performance and reduce the overhead of branch prediction failures and method calls, dynamic code generation solutions have emerged.
Fourth, based on dynamic code generation to achieve where condition filtering
Before introducing the use of dynamic code generation to achieve where condition filtering, it is necessary to introduce the background of bytecode generation technology, the unique advantages of Java language and related basic operations.
4.1 The method of bytecode generation
The Java Virtual Machine specification has two key points: platform independence and language independence.
Platform independence achieves the goal of write once, run anywhere. That is, it is not limited whether the operating system is Windows or Linux.
Language independence makes the languages running on the JVM not limited to Java, such as Groovy, Scala, and JRuby have become part of the JVM ecosystem. The basis for realizing platform independence and language independence is the virtual machine and bytecode storage technology based on stack execution instructions.
For any programming language: program analysis, program generation, and program conversion techniques are widely used in development, usually in the following scenarios:
- Program analysis: Based on syntax and semantic analysis, identify potential bugs and useless code, or perform reverse engineering to study the internal principles of software (such as software cracking or developing crawlers)
- Program generation: such as traditional compilers, stub or skeleton compilers for distributed systems, or JIT compilers, etc.
- Program transformation: optimize or obfuscate code, insert debugging code, performance monitoring, etc.
For the Java programming language, since there are three levels of Java source code-bytecode-machine code, there are two entry points for the implementation of program analysis, program generation, and program conversion: Java source code or compiled Class. Selecting the compiled Class bytecode has the following advantages:
- No source code required. This is also very convenient for closed-source commercial software to achieve cross-platform performance monitoring and other requirements.
- Runtime analysis, generation, transformation. As long as the class bytecode is processed before being loaded into the virtual machine, the entire processing flow is transparent to the user.
Program generation technology usually has another name in Java: bytecode generation technology. This also shows that the entry point chosen by the Java language is the compiled Class bytecode.
Bytecode generation technology is also widely used in the Java technology stack, such as: AOP of the Spring project, various ORM frameworks, and hot deployment of Tomcat. Java has many bytecode manipulation frameworks, typically asm and javassist, bytebuddy, jnif, etc.
Usually asm is more widely used for performance reasons. Direct use of asm requires understanding of JVM instructions, and the learning threshold for users is relatively high. Facebook has carried out a layer of encapsulation based on asm, which is the airlift.bytecode tool. The dynamic code generation provided by this tool is also a great tool for presto performance assurance. Users can avoid writing JVM instructions directly by using airlift.bytecode. However, the framework has less documentation, and usually the operation can only be learned from the source code of its TestCase and presto. This section briefly summarizes the basic usage of using airlift.bytecode to generate code.
Usually, we can operate a programming language after we understand variables, arrays, control logic, loop logic, and calling external methods. As for the core library, its role is to help us develop more efficiently. For using the airlift.bytecode framework, understanding the common operations of defining classes, defining methods (branching, looping and method calls), and methods performing these common operations can meet most business needs:
Case 1: Defining a class
private static final AtomicLong CLASS_ID = new AtomicLong();
private static final DateTimeFormatter TIMESTAMP_FORMAT = DateTimeFormatter.ofPattern("YYYYMMdd_HHmmss");
private String clazzName;
private ClassDefinition classDefinition;
public ByteCodeGenDemo(String clazzName){
this.clazzName=clazzName;
}
public static ParameterizedType makeClassName(String baseName, Optional<String> suffix)
{
String className = baseName
+ "_" + suffix.orElseGet(() -> Instant.now().atZone(UTC).format(TIMESTAMP_FORMAT))
+ "_" + CLASS_ID.incrementAndGet();
return typeFromJavaClassName("org.shgy.demo.$gen." + toJavaIdentifierString(className));
}
public void buildClass(){
ClassDefinition classDefinition = new ClassDefinition(
a(PUBLIC, FINAL),
makeClassName(clazzName,Optional.empty()),
type(Object.class));
this.classDefinition=classDefinition;
}
Through the above code, a public final modified class is defined, and it is ensured that the program operation summary class name will not be repeated.
Case 2: Definition Method--IF Control Logic
/**
* 生成if分支代码
* if(a<0){
* System.out.println(a +" a<0");
* }else{
* System.out.println(a +" a>=0");
* }
* @param methodName
*/
public void buildMethod1(String methodName){
Parameter argA = arg("a", int.class);
MethodDefinition method = classDefinition.declareMethod(
a(PUBLIC, STATIC),
methodName,
type(void.class),
ImmutableList.of(argA));
BytecodeExpression out = getStatic(System.class, "out");
IfStatement ifStatement = new IfStatement();
ifStatement.condition(lessThan(argA,constantInt(0)))
.ifTrue(new BytecodeBlock()
.append(out.invoke("print", void.class, argA))
.append(out.invoke("println", void.class, constantString(" a<0")))
)
.ifFalse(new BytecodeBlock()
.append(out.invoke("print", void.class, argA))
.append(out.invoke("println", void.class, constantString(" a>=0")))
);
method.getBody().append(ifStatement).ret();
}
Case 3: Define Method – Switch Control Logic
/**
* 生成switch分支代码
* switch (a){
* case 1:
* System.out.println("a=1");
* break;
* case 2:
* System.out.println("a=2");
* break;
* default:
* System.out.println("a=others");
* }
* @param methodName
*/
public void buildMethod2(String methodName){
Parameter argA = arg("a", int.class);
MethodDefinition method = classDefinition.declareMethod(
a(PUBLIC, STATIC),
methodName,
type(void.class),
ImmutableList.of(argA));
SwitchStatement.SwitchBuilder switchBuilder = new SwitchStatement.SwitchBuilder().expression(argA);
switchBuilder.addCase(1, BytecodeExpressions.print(BytecodeExpressions.constantString("a=1")));
switchBuilder.addCase(2,BytecodeExpressions.print(BytecodeExpressions.constantString("a=2")));
switchBuilder.defaultCase(invokeStatic(ByteCodeGenDemo.class,"defaultCase", void.class));
method.getBody().append(switchBuilder.build()).ret();
}
public static void defaultCase(){
System.out.println("a=others");
}
Case 4: Defining Method - ForLoop Logic
/**
* 生成循环逻辑代码
* int sum=0;
* for(int i=s;i<=e;i++){
* sum+=i;
* System.out.println("i="+i+",sum="+sum);
* }
* @param methodName
*/
public void buildMethodLoop(String methodName){
Parameter argS = arg("s", int.class);
Parameter argE = arg("e", int.class);
MethodDefinition method = classDefinition.declareMethod(
a(PUBLIC, STATIC),
methodName,
type(int.class),
ImmutableList.of(argS, argE));
Scope scope = method.getScope();
Variable i = scope.declareVariable(int.class,"i");
Variable sum = scope.declareVariable(int.class,"sum");
BytecodeExpression out = getStatic(System.class, "out");
ForLoop loop = new ForLoop()
.initialize(i.set(argS))
.condition(lessThanOrEqual(i, argE))
.update(incrementVariable(i,(byte)1))
.body(new BytecodeBlock()
.append(sum.set(add(sum,i)))
.append(out.invoke("print", void.class, constantString("i=")))
.append(out.invoke("print", void.class, i))
.append(out.invoke("print", void.class, constantString(",sum=")))
.append(out.invoke("println", void.class,sum))
);
method.getBody().initializeVariable(i).putVariable(sum,0).append(loop).append(sum).retInt();
}
Case 5: Generate class and execute method
public void executeLoop(String methodName) throws NoSuchMethodException, InvocationTargetException, IllegalAccessException {
// invoke
Class<?> clazz = classGenerator(new DynamicClassLoader(this.getClass().getClassLoader())).defineClass(this.classDefinition,Object.class);
Method loopMethod = clazz.getMethod(methodName, int.class,int.class);
loopMethod.invoke(null,1,10);
}
Case 6: Manipulating Data Structures - Getting Values from Map Data Structures
public void buildMapGetter(String methodName){
Parameter argRow = arg("row", Map.class);
MethodDefinition method = classDefinition.declareMethod(
a(PUBLIC, STATIC),
methodName,
type(void.class),
of(argRow));
BytecodeExpression out = getStatic(System.class, "out");
Scope scope = method.getScope();
Variable a = scope.declareVariable(int.class,"a");
// 从map中获取key=aa对应的值
method.getBody().append(out.invoke("print", void.class, argRow.invoke("get",Object.class,constantString("aa").cast(Object.class)))).ret();
}
// 代码执行
public void executeMapOp(String methodName) throws NoSuchMethodException, InvocationTargetException, IllegalAccessException {
// invoke
Class<?> clazz = classGenerator(new DynamicClassLoader(this.getClass().getClassLoader())).defineClass(this.classDefinition,Object.class);
Method loopMethod = clazz.getMethod(methodName, Map.class);
Map<String,Integer> map = Maps.newHashMap();
map.put("aa",111);
loopMethod.invoke(null,map);
}
Through the above cases, we have learned the basic usage of the airlift.bytecode framework. If you want to study more deeply, you need to refer to reading ASM-related materials, after all, airlift.bytecode is built based on ASM. But in this study, this is enough.
4.2 Implement where condition filtering based on dynamic code generation
After being familiar with the basic usage of the dynamic code generation framework, we can use this tool to implement specific business logic. Similarly, we generate the bytecode of where condition filtering based on AstVisitor implementation.
The overall code framework is consistent with the previous implementation, and the key point to solve the problem lies in the logic of bytecode generation. For the query statement of the where condition, it is essentially a binary tree. For binary tree traversal, recursion is the easiest way. To a certain extent, recursion is consistent with the operation of the stack.
For the implementation of where condition filtering code generation, the implementation logic is described as follows:
Input: expression expression generated by antlr Output: class generated by airlift.bytecode
s1: Define the basic configuration of the clearly generated class: class name, modifiers and other information
s2: Define a stack to store the result of the comparison operation (ComparisonExpression)
s3: traverse expression recursively
s4: For the leaf node (ComparisonExpression), the code generation logic is as follows: take the corresponding value from the parameters defined by the method, generate the calculation code according to the comparison symbol, and push the calculation result to the stack
s5: For non-leaf nodes (LogicalBinaryExpression), the code generation logic is as follows: take out the two values at the top of the stack, perform an and or or operation, and push the calculation result to the stack
s6: When recursing back to the root node, take the value at the top of the stack as the final result of the calculation
s7: Generate Class based on class and method definitions
The code to implement bytecode generation is as follows:
/**
* 生成比较条件语句
**/
@Override
protected Void visitComparisonExpression(ComparisonExpression node, MethodDefinition context) {
ComparisonExpression.Operator op = node.getOperator();
Expression left = node.getLeft();
Expression right = node.getRight();
if(left instanceof Identifier && right instanceof LongLiteral){
String leftKey = ((Identifier) left).getValue();
Long rightKey = ((LongLiteral) right).getValue();
Parameter argRow = context.getParameters().get(0);
Variable stack = context.getScope().getVariable("stack");
BytecodeBlock body = context.getBody();
BytecodeExpression leftVal = argRow.invoke("get", Object.class,constantString(leftKey).cast(Object.class)).cast(long.class);
BytecodeExpression cResult;
switch (op){
case EQUAL:
cResult = equal(leftVal,constantLong(rightKey));
break;
case LESS_THAN:
cResult = lessThan(leftVal,constantLong(rightKey));
break;
case GREATER_THAN:
cResult =greaterThan(leftVal,constantLong(rightKey));
break;
case NOT_EQUAL:
cResult = notEqual(leftVal,constantLong(rightKey));
break;
case LESS_THAN_OR_EQUAL:
cResult = lessThanOrEqual(leftVal,constantLong(rightKey));
break;
case GREATER_THAN_OR_EQUAL:
cResult = greaterThanOrEqual(leftVal,constantLong(rightKey));
break;
default:
throw new UnsupportedOperationException("not implemented");
}
body.append(stack.invoke("push",Object.class, cResult.cast(Object.class)));
return null;
}else{
throw new UnsupportedOperationException("not implemented");
}
}
After the code is implemented, in order to verify whether the processing logic is normal, you can run the same test case in two implementations to ensure that the same where expression is executed with the same parameters and the results are consistent.
In order to verify the performance of the two implementations, the JMH framework is introduced here, and the performance verification code is generated based on the JMH framework:
@BenchmarkMode(Mode.Throughput)
@Fork(1)
@State(value = Scope.Benchmark)
public class RowFilterBenchmark {
private RowFilter filter1;
private RowFilter filter2;
private List<Map<String,Long>> dataSet = Lists.newArrayListWithCapacity(100000);
@Setup
public void init(){
// antlr处理表达式语句,生成Expression对象
SqlParser sqlParser = new SqlParser();
Expression expression = sqlParser.createExpression("a>5 and b<5");
// 基于AstVisitor实现
this.filter1 = new WhereExpFilter(expression);
// 基于AstVisitor实现
ExpressionCodeCompiler compiler = new ExpressionCodeCompiler();
Class clazz = compiler.compile(expression);
try {
this.filter2 = (RowFilter) clazz.newInstance();
} catch (InstantiationException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
}
Random r = new Random();
for(int i=0;i<100000;i++){
Map<String,Long> row = new HashMap<>();
row.put("a", (long) r.nextInt(10));
row.put("b", (long) r.nextInt(10));
dataSet.add(row);
}
}
@Benchmark
public int testAstDirect() {
int cnt =0;
for(Map<String,Long> row:dataSet){
boolean ret = filter1.filter(row);
if(ret){
cnt++;
}
}
return cnt;
}
@Benchmark
public int testAstCompile() {
int cnt =0;
for(Map<String,Long> row:dataSet){
boolean ret = filter2.filter(row);
if(ret){
cnt++;
}
}
return cnt;
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(RowFilterBenchmark.class.getSimpleName())
.build();
new Runner(opt).run();
}
}
Using a dataset of the order of 100,000, the results of the performance verification are as follows:
Benchmark Mode Cnt Score Error Units
RowFilterBenchmark.testAstCompile thrpt 5 211.298 ± 30.832 ops/s
RowFilterBenchmark.testAstDirect thrpt 5 62.254 ± 8.269 ops/s
Based on the above verification data, a preliminary conclusion can be drawn. For simple comparison expressions, the method based on code generation has a performance improvement of about 3 times compared with the method of direct traversal. In contrast, where condition filtering is directly implemented based on AstVisitor, the code generation does not need to judge the operators in the expression, and directly generates code dynamically based on the expression, cutting many branches of judgment.
V. Summary
This article explores the implementation ideas of where expressions in the SQL engine, and implements two methods based on antlr:
One is to directly traverse the Expression generated by the expression;
The second is that the Expression generated based on the expression dynamically generates bytecode through airlift.bytecode.
This paper preliminarily analyzes the starting point and background problems of application code generation in Presto to implement related business logic. And use JMH for performance test, the test results show that for the same implementation idea, the performance based on the code generation method is about 3 times better than the direct implementation.
In fact, the way of using code generation in Presto is much more complicated than that described in this article, and it is not the same as the way of text implementation. The exploration based on this article is more about exploring the basic ideas of research, rather than recreating a Presto.
Although the performance improvement effect of using dynamic code generation is obvious, in business practice, it is necessary to weigh the ROI of using code generation. After all, the logic, code readability and maintainability implemented by code generation are much more complicated than direct coding. The development complexity is also much more complicated. Just like C language embedded assembly, the use of code generation technology in business development also requires careful consideration. Proper use can achieve twice the result with half the effort. Improper use or abuse will plant unpredictable time bombs for the project.
References:
- "Introduction to Computer Science"
- "In-depth understanding of the Java virtual machine"
- "asm4-guide"
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。