Author: Sun Jian
Aikesheng R&D engineer, responsible for high-availability building and SQL audit related development.
Source of this article: original submission
* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.
This article mainly introduces how to use TiDB SQL to parse custom generated SQL fingerprints, using a different from pt-fingerprint
( https://www.percona.com/doc/percona-toolkit/3.0/pt-fingerprint.html) the way.
What is SQL fingerprint
SQL fingerprint refers to replacing the literal value in a SQL with other fixed symbols. It can be used for SQL desensitization or SQL classification.
E.g:
select * from t1 where id = 100;
Into:
select * from t1 where id = ?;
Implementation of pt-fingerprint
Judging from pt-fingerprint
, it mainly uses regular matching of SQL strings to replace corresponding characters. The code has more than 2,000 lines, and the complete string parsing will make the code extremely complex and difficult to read. The advantage is that there is no need to care about SQL semantics.
Implementation based on TiDB SQL parser
The function of TiDB SQL parser is to parse SQL statements according to SQL grammar rules and convert the text into an abstract syntax tree. In addition, TiDB SQL parser supports converting the syntax tree into SQL text, so you can modify the SQL text by modifying the syntax tree structure. .
1. Parse SQL into syntax tree through TiDB SQL parser
The parsed syntax tree is roughly as follows, where "..." means that there are multiple levels before.
&ast.SelectStmt {
Fields:
... &ast.WildCard
From:
... &ast.TableName
... "t1"
Where: &ast.BinaryOperationExpr
L: &ast.ColumnNameExpr
... "id"
R:&ast.ValueExpr
... 100
}
2. Modify the value corresponding to the node on the syntax tree
The TiDB syntax parser code implements a set of visitor design patterns, which can traverse the syntax tree Visitor
According to the syntax tree structure in 1, we only need to ast.ValueExpr
his specific value with ?
Visitor
interface:
// Visitor visits a Node.
type Visitor interface {
Enter(n Node) (node Node, skipChildren bool)
Leave(n Node) (node Node, ok bool)
}
Realize the Visitor
interface:
//此处省略N行代码
// 定义一个 FingerprintVisitor 使其实现 Visitor 接口
type FingerprintVisitor struct{}
func (f *FingerprintVisitor) Enter(n ast.Node) (node ast.Node, skipChildren bool) {
// 当访问到ValueExpr 时,只需要将ValueExpr的值替换掉就行
if v, ok := n.(*driver.ValueExpr); ok {
v.Type.Charset = ""
v.SetValue([]byte("?"))
}
return n, false
}
func (f *FingerprintVisitor) Leave(n ast.Node) (node ast.Node, ok bool) {
return n, true
}
3. Restore the syntax tree to SQL
TiDB SQL parser has provided interface Restore(ctx *RestoreCtx) error
since v3, which supports converting syntax tree into SQL text
Complete code
package main
import (
"bytes"
"fmt"
"github.com/pingcap/parser"
"github.com/pingcap/parser/ast"
"github.com/pingcap/parser/format"
driver "github.com/pingcap/tidb/types/parser_driver"
)
// 定义一个 FingerprintVisitor 使其实现 Visitor 接口
type FingerprintVisitor struct{}
func (f *FingerprintVisitor) Enter(n ast.Node) (node ast.Node, skipChildren bool) {
// 当访问到ValueExpr 时,只需要将ValueExpr的值替换掉就行
if v, ok := n.(*driver.ValueExpr); ok {
v.Type.Charset = ""
v.SetValue([]byte("?"))
}
return n, false
}
func (f *FingerprintVisitor) Leave(n ast.Node) (node ast.Node, ok bool) {
return n, true
}
func main() {
sql := "select * from t1 where id = 100;"
p := parser.New()
stmt, err := p.ParseOneStmt(sql, "", "")
if err != nil {
// 省略错误处理
return
}
stmt.Accept(&FingerprintVisitor{})
buf := new(bytes.Buffer)
restoreCtx := format.NewRestoreCtx(format.RestoreKeyWordUppercase|format.RestoreNameBackQuotes, buf)
err = stmt.Restore(restoreCtx)
if nil != err {
// 省略错误处理
return
}
fmt.Println(buf.String())
// SELECT * FROM `t1` WHERE `id`=?
}
to sum up
- Using TiDB SQL parser can quickly and accurately implement SQL fingerprints, which reduces the complexity of reading compared to string parsing;
- In addition, you need to spend time to understand the structure of the TiDB syntax tree.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。