Author: Sun Jian

Aikesheng R&D engineer, responsible for high-availability building and SQL audit related development.

Source of this article: original submission

* Produced by the Aikesheng open source community, original content is not allowed to be used without authorization, please contact the editor and indicate the source for reprinting.


This article mainly introduces how to use TiDB SQL to parse custom generated SQL fingerprints, using a different from pt-fingerprint ( https://www.percona.com/doc/percona-toolkit/3.0/pt-fingerprint.html) the way.

What is SQL fingerprint

SQL fingerprint refers to replacing the literal value in a SQL with other fixed symbols. It can be used for SQL desensitization or SQL classification.
E.g:

select * from t1 where id = 100;

Into:

select * from t1 where id = ?;

Implementation of pt-fingerprint

Judging from pt-fingerprint , it mainly uses regular matching of SQL strings to replace corresponding characters. The code has more than 2,000 lines, and the complete string parsing will make the code extremely complex and difficult to read. The advantage is that there is no need to care about SQL semantics.

Implementation based on TiDB SQL parser

The function of TiDB SQL parser is to parse SQL statements according to SQL grammar rules and convert the text into an abstract syntax tree. In addition, TiDB SQL parser supports converting the syntax tree into SQL text, so you can modify the SQL text by modifying the syntax tree structure. .

1. Parse SQL into syntax tree through TiDB SQL parser

The parsed syntax tree is roughly as follows, where "..." means that there are multiple levels before.

&ast.SelectStmt {
    Fields:
        ... &ast.WildCard
    From: 
        ... &ast.TableName
            ... "t1"
    Where: &ast.BinaryOperationExpr
        L: &ast.ColumnNameExpr
            ... "id"
        R:&ast.ValueExpr
            ... 100
}                

2. Modify the value corresponding to the node on the syntax tree

The TiDB syntax parser code implements a set of visitor design patterns, which can traverse the syntax tree Visitor According to the syntax tree structure in 1, we only need to ast.ValueExpr his specific value with ?

Visitor interface:

// Visitor visits a Node.
type Visitor interface {
    Enter(n Node) (node Node, skipChildren bool)
    Leave(n Node) (node Node, ok bool)
}

Realize the Visitor interface:

//此处省略N行代码

// 定义一个 FingerprintVisitor 使其实现 Visitor 接口
type FingerprintVisitor struct{}

func (f *FingerprintVisitor) Enter(n ast.Node) (node ast.Node, skipChildren bool) {
    // 当访问到ValueExpr 时,只需要将ValueExpr的值替换掉就行
    if v, ok := n.(*driver.ValueExpr); ok {
        v.Type.Charset = ""
        v.SetValue([]byte("?"))
    }
    return n, false
}

func (f *FingerprintVisitor) Leave(n ast.Node) (node ast.Node, ok bool) {
    return n, true
}

3. Restore the syntax tree to SQL

TiDB SQL parser has provided interface Restore(ctx *RestoreCtx) error since v3, which supports converting syntax tree into SQL text

Complete code

package main

import (
    "bytes"
    "fmt"

    "github.com/pingcap/parser"
    "github.com/pingcap/parser/ast"
    "github.com/pingcap/parser/format"
    driver "github.com/pingcap/tidb/types/parser_driver"
)

// 定义一个 FingerprintVisitor 使其实现 Visitor 接口
type FingerprintVisitor struct{}

func (f *FingerprintVisitor) Enter(n ast.Node) (node ast.Node, skipChildren bool) {
    // 当访问到ValueExpr 时,只需要将ValueExpr的值替换掉就行
    if v, ok := n.(*driver.ValueExpr); ok {
        v.Type.Charset = ""
        v.SetValue([]byte("?"))
    }
    return n, false
}

func (f *FingerprintVisitor) Leave(n ast.Node) (node ast.Node, ok bool) {
    return n, true
}

func main() {
    sql := "select * from t1 where id = 100;"
    p := parser.New()
    stmt, err := p.ParseOneStmt(sql, "", "")
    if err != nil {
        // 省略错误处理
        return
    }
    stmt.Accept(&FingerprintVisitor{})

    buf := new(bytes.Buffer)
    restoreCtx := format.NewRestoreCtx(format.RestoreKeyWordUppercase|format.RestoreNameBackQuotes, buf)
    err = stmt.Restore(restoreCtx)
    if nil != err {
        // 省略错误处理
        return
    }
    fmt.Println(buf.String())
    // SELECT * FROM `t1` WHERE `id`=?
}

to sum up

  1. Using TiDB SQL parser can quickly and accurately implement SQL fingerprints, which reduces the complexity of reading compared to string parsing;
  2. In addition, you need to spend time to understand the structure of the TiDB syntax tree.

爱可生开源社区
426 声望211 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。