It is often said that reading source code is the only way for every outstanding development engineer, but when faced with a complex system like TiDB, source code reading is a very huge project. For some TiDB users, starting from the daily problems they encounter, reading the source code in turn is a good starting point, so we planned the "Reading Source Code with Questions" series of articles.

This article is the second in the series. It takes an example of Power BI Desktop's abnormal performance on TiDB, and introduces the process from the discovery and positioning of the problem to the issue of raising issues and writing PR through the open source community to solve the problem, from the implementation of the code. To do trouble shooting from a perspective, I hope to help you better understand the TiDB source code.

First, let's reproduce the failed scenario (TiDB 5.1.1 on MacOS) and create a simple table with only one field:

CREATE TABLE test(name VARCHAR(1) PRIMARY KEY);

Yes on MySQL, but not on TiDB, an error is reported

DataSource.Error: An error happened while reading data from the provider: 'Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints.'
Details:
DataSourceKind=MySql
DataSourcePath=localhost:4000;test

Look at the last SQL run on the general log TiDB:

select COLUMN_NAME, ORDINAL_POSITION, IS_NULLABLE, DATA_TYPE, case when NUMERIC_PRECISION is null then null when DATA_TYPE in ('FLOAT', 'DOUBLE') then 2 else 10 end AS NUMERIC_PRECISION_RADIX, NUMERIC_PRECISION, NUMERIC_SCALE,            CHARACTER_MAXIMUM_LENGTH, COLUMN_DEFAULT, COLUMN_COMMENT AS DESCRIPTION, COLUMN_TYPE  from INFORMATION_SCHEMA.COLUMNS  where table_schema = 'test' and table_name = 'test';

We use tiup to start a TiDB cluster, and use tiup client to execute this command, the tiup client will also report an error:

error: mysql: sql: Scan error on column index 4, name "NUMERIC_PRECISION_RADIX": converting NULL to int64 is unsupported

Then our attention is focused on solving the problem of this sentence, let's first look at what the error reported by the tiup client means. The tiup client uses the golang xo/usql library, but in the xo/usql library, we cannot find the corresponding error message. The grep converting keyword returns extremely limited and irrelevant content. Let's take a look at the mysql driver of xo/usql go-sql-driver/mysql , download its code and grep converting, only a piece of information in the changelog is returned, and there is a high probability that the place where the error is reported is not in this library. go-sql-driver/mysql look at database/sql and find that it depends on 061b01fbbe61f3, then let’s take a look at the content of database/sql database/sql is the standard library of golang, so we need to download the source code of golang. In the database directory of golang grep converting, I quickly found the content that matched the error message:

go/src/database/sql/convert.go

case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
        if src == nil {
                return fmt.Errorf("converting NULL to %s is unsupported", dv.Kind())
        }
        s := asString(src)
        i64, err := strconv.ParseInt(s, 10, dv.Type().Bits())
        if err != nil {
                err = strconvErr(err)
                return fmt.Errorf("converting driver.Value type %T (%q) to a %s: %v", src, s, dv.Kind(), err)
        }
        dv.SetInt(i64)
        return nil

Let's trace this snippet again to see how the types here come from, and eventually we will return to go-sql-driver/mysql:

mysql/fields.go

        case fieldTypeLongLong:
                if mf.flags&flagNotNULL != 0 {
                        if mf.flags&flagUnsigned != 0 {
                                return scanTypeUint64
                        }
                        return scanTypeInt64
                }
                return scanTypeNullInt

column definition in the return body of the parsing statement, converted to the type in golang. We can use mysql --host 127.0.0.1 --port 4000 -u root --column-type-info to view the column metadata returned by the problematic SQL after connecting:

MySQL

Field 5: `NUMERIC_PRECISION_RADIX`
Catalog: `def`
Database: `` 
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 3
Max_length: 0
Decimals: 0
Flags: BINARY NUM

TiDB

Field 5: `NUMERIC_PRECISION_RADIX`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 2
Max_length: 0
Decimals: 0
Flags: NOT_NULL BINARY NUM

It can be clearly seen that NUMERIC_PRECISION_RADIX field in the error message of the tiup client has obvious problems on TiDB. This field is marked as NOT_NULL in the return body of TiDB. Obviously this is unreasonable because of this field. Obviously it can be NULL , and the return value of MySQL also reflects this. So xo/usql reported an error when processing the returned body. At this point, we have discovered why the client side reported an error. Next, we need to find why TiDB returns an incorrect column definition.

By TiDB Dev Guide we can know in a general process execution TiDB DQL statements, from the entrance of server/conn.go#clientConn.Run looked down, all the way through server/conn.go#clientConn.dispatch , server/conn.go#clientConn.handleQuery , server/conn.go#clientConn.handleStmt , server/driver_tidb.go#TiDBContext.ExecuteStmt , session/session.go#session.ExecuteStmt , executor/compiler.go#Compiler.Compile , planner/optimize.go#Optimize , planner/optimize.go#optimize , planner/core/planbuilder.go#PlanBuilder.Build , planner/core/logical_plan_builder.go#PlanBuilder.buildSelect , in buildSelect , we can see a series of processing of query statements by TiDB planner, and then we can go to planner/core/expression_rewriter.go#PlanBuilder.rewriteWithPreprocess and planner/core/expression_rewriter.go#PlanBuilder.rewriteExprNode . In rewriteExprNode , the problematic field NUMERIC_PRECISION_RADIX be parsed, and finally this CASE expression will be parsed. In expression/builtin_control.go#caseWhenFunctionClass.getFunction , we finally came to the place to calculate the column definition returned by the CASE expression (this depends on the AST parsed by the traversal compiler):

    for i := 1; i < l; i += 2 {       
        fieldTps = append(fieldTps, args[i].GetType())
        decimal = mathutil.Max(decimal, args[i].GetType().Decimal)
        if args[i].GetType().Flen == -1 {
            flen = -1
        } else if flen != -1 {
            flen = mathutil.Max(flen, args[i].GetType().Flen)
        }
        isBinaryStr = isBinaryStr || types.IsBinaryStr(args[i].GetType())
        isBinaryFlag = isBinaryFlag || !types.IsNonBinaryStr(args[i].GetType())
    }
    if l%2 == 1 {
        fieldTps = append(fieldTps, args[l-1].GetType())
        decimal = mathutil.Max(decimal, args[l-1].GetType().Decimal)
        if args[l-1].GetType().Flen == -1 {
            flen = -1
        } else if flen != -1 {
            flen = mathutil.Max(flen, args[l-1].GetType().Flen)
        }
        isBinaryStr = isBinaryStr || types.IsBinaryStr(args[l-1].GetType())
        isBinaryFlag = isBinaryFlag || !types.IsNonBinaryStr(args[l-1].GetType())
    }


    fieldTp := types.AggFieldType(fieldTps)
    // Here we turn off NotNullFlag. Because if all when-clauses are false,
    // the result of case-when expr is NULL.
    types.SetTypeFlag(&fieldTp.Flag, mysql.NotNullFlag, false)
    tp := fieldTp.EvalType()


    if tp == types.ETInt {
        decimal = 0
    }
    fieldTp.Decimal, fieldTp.Flen = decimal, flen
    if fieldTp.EvalType().IsStringKind() && !isBinaryStr {
        fieldTp.Charset, fieldTp.Collate = DeriveCollationFromExprs(ctx, args...)
        if fieldTp.Charset == charset.CharsetBin && fieldTp.Collate == charset.CollationBin {
            // When args are Json and Numerical type(eg. Int), the fieldTp is String.
            // Both their charset/collation is binary, but the String need a default charset/collation.
            fieldTp.Charset, fieldTp.Collate = charset.GetDefaultCharsetAndCollate()
        }
    } else {
        fieldTp.Charset, fieldTp.Collate = charset.CharsetBin, charset.CollationBin
    }
    if isBinaryFlag {
        fieldTp.Flag |= mysql.BinaryFlag
    }
    // Set retType to BINARY(0) if all arguments are of type NULL.
    if fieldTp.Tp == mysql.TypeNull {
        fieldTp.Flen, fieldTp.Decimal = 0, types.UnspecifiedLength
        types.SetBinChsClnFlag(fieldTp)
    }

Looking at the code for calculating the column definition flag above, we can find that no matter what the CASE expression is, the NOT_NULL flag bit will definitely be set to false , so the problem does not appear here! At this time, we can only look back along the code path above to see if the column definition generated above has been modified in the future. Finally in server/conn.go#clientConn.handleStmt , it was found that it called server/conn.go#clientConn.writeResultSet , and then successively called server/conn.go#clientConn.writeChunks , server/conn.go#clientConn.writeColumnInfo , server/column.go#ColumnInfo.Dump and server/column.go#dumpFlag . In dumpFlag, the previously generated column definition flag was modified:

func dumpFlag(tp byte, flag uint16) uint16 {
    switch tp {
    case mysql.TypeSet:
        return flag | uint16(mysql.SetFlag)
    case mysql.TypeEnum:
        return flag | uint16(mysql.EnumFlag)
    default:
        if mysql.HasBinaryFlag(uint(flag)) {
            return flag | uint16(mysql.NotNullFlag)
        }
        return flag
    }
}

Finally, we found the reason why TiDB returned the wrong column definition! In fact, this bug has been fixed in the latest TiDB version 5.2.0: *: fix some problems related to notNullFlag by wjhuang2016 · Pull Request #27697 · pingcap/tidb .

Finally, in the above process of reading the code, it is actually best for us to see what the AST parsed by TiDB looks like, so that we will not be blinded during the final traversal of the AST. parser chapter in the TiDB dev guide that explains how to debug the parser, parser/quickstart.md at master · pingcap/parser There is also a sample output generated AST, but simply outputting davecgh/go-spew basically has no direct effect. Output the node generated by the parser, so that you can get a comprehensible tree:

package main

import (
        "fmt"
        "github.com/pingcap/parser"
        "github.com/pingcap/parser/ast"
        _ "github.com/pingcap/parser/test_driver"
        "github.com/davecgh/go-spew/spew"
)

func parse(sql string) (*ast.StmtNode, error) {
        p := parser.New()
        stmtNodes, _, err := p.Parse(sql, "", "")
        if err != nil {
                return nil, err
        }
        return &stmtNodes[0], nil
}

func main() {
        spew.Config.Indent = "    "
        astNode, err := parse("SELECT a, b FROM t")
        if err != nil {
                fmt.Printf("parse error: %v\n", err.Error())
                return
        }
        fmt.Printf("%s\n", spew.Sdump(*astNode))
}
(*ast.SelectStmt)(0x140001dac30)({
    dmlNode: (ast.dmlNode) {
        stmtNode: (ast.stmtNode) {
            node: (ast.node) {
                text: (string) (len=18) "SELECT a, b FROM t"
            }
        }
    },
    resultSetNode: (ast.resultSetNode) {
        resultFields: ([]*ast.ResultField) <nil>
    },
    SelectStmtOpts: (*ast.SelectStmtOpts)(0x14000115bc0)({
        Distinct: (bool) false,
        SQLBigResult: (bool) false,
        SQLBufferResult: (bool) false,
        SQLCache: (bool) true,
        SQLSmallResult: (bool) false,
        CalcFoundRows: (bool) false,
        StraightJoin: (bool) false,
        Priority: (mysql.PriorityEnum) 0,
        TableHints: ([]*ast.TableOptimizerHint) <nil>
    }),
    Distinct: (bool) false,
    From: (*ast.TableRefsClause)(0x140001223c0)({
        node: (ast.node) {
            text: (string) ""
        },
        TableRefs: (*ast.Join)(0x14000254100)({
            node: (ast.node) {
                text: (string) ""
            },
            resultSetNode: (ast.resultSetNode) {
                resultFields: ([]*ast.ResultField) <nil>
            },
            Left: (*ast.TableSource)(0x14000156480)({
                node: (ast.node) {
                    text: (string) ""
                },
                Source: (*ast.TableName)(0x1400013a370)({
                    node: (ast.node) {
                        text: (string) ""
                    },
                    resultSetNode: (ast.resultSetNode) {
                        resultFields: ([]*ast.ResultField) <nil>
                    },
                    Schema: (model.CIStr) ,
                    Name: (model.CIStr) t,
                    DBInfo: (*model.DBInfo)(<nil>),
                    TableInfo: (*model.TableInfo)(<nil>),
                    IndexHints: ([]*ast.IndexHint) <nil>,
                    PartitionNames: ([]model.CIStr) {
                    }
                }),
                AsName: (model.CIStr)
            }),
            Right: (ast.ResultSetNode) <nil>,
            Tp: (ast.JoinType) 0,
            On: (*ast.OnCondition)(<nil>),
            Using: ([]*ast.ColumnName) <nil>,
            NaturalJoin: (bool) false,
            StraightJoin: (bool) false
        })
    }),
    Where: (ast.ExprNode) <nil>,
    Fields: (*ast.FieldList)(0x14000115bf0)({
        node: (ast.node) {
            text: (string) ""
        },
        Fields: ([]*ast.SelectField) (len=2 cap=2) {
            (*ast.SelectField)(0x140001367e0)({
                node: (ast.node) {
                    text: (string) (len=1) "a"
                },
                Offset: (int) 7,
                WildCard: (*ast.WildCardField)(<nil>),
                Expr: (*ast.ColumnNameExpr)(0x14000254000)({
                    exprNode: (ast.exprNode) {
                        node: (ast.node) {
                            text: (string) ""
                        },
                        Type: (types.FieldType) unspecified,
                        flag: (uint64) 8
                    },
                    Name: (*ast.ColumnName)(0x1400017dc70)(a),
                    Refer: (*ast.ResultField)(<nil>)
                }),
                AsName: (model.CIStr) ,
                Auxiliary: (bool) false
            }),
            (*ast.SelectField)(0x14000136840)({
                node: (ast.node) {
                    text: (string) (len=1) "b"
                },
                Offset: (int) 10,
                WildCard: (*ast.WildCardField)(<nil>),
                Expr: (*ast.ColumnNameExpr)(0x14000254080)({
                    exprNode: (ast.exprNode) {
                        node: (ast.node) {
                            text: (string) ""
                        },
                        Type: (types.FieldType) unspecified,
                        flag: (uint64) 8
                    },
                    Name: (*ast.ColumnName)(0x1400017dce0)(b),
                    Refer: (*ast.ResultField)(<nil>)
                }),
                AsName: (model.CIStr) ,
                Auxiliary: (bool) false
            })
        }
    }),
    GroupBy: (*ast.GroupByClause)(<nil>),
    Having: (*ast.HavingClause)(<nil>),
    WindowSpecs: ([]ast.WindowSpec) <nil>,
    OrderBy: (*ast.OrderByClause)(<nil>),
    Limit: (*ast.Limit)(<nil>),
    LockTp: (ast.SelectLockType) none,
    TableHints: ([]*ast.TableOptimizerHint) <nil>,
    IsAfterUnionDistinct: (bool) false,
    IsInBraces: (bool) false,
    QueryBlockOffset: (int) 0,
    SelectIntoOpt: (*ast.SelectIntoOption)(<nil>)
})

PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...