It is often said that reading source code is the only way for every outstanding development engineer, but when faced with a complex system like TiDB, source code reading is a very huge project. For some TiDB users, starting from the daily problems they encounter, reading the source code in turn is a good starting point, so we planned the "Reading Source Code with Questions" series of articles.
This article is the second in the series. It takes an example of Power BI Desktop's abnormal performance on TiDB, and introduces the process from the discovery and positioning of the problem to the issue of raising issues and writing PR through the open source community to solve the problem, from the implementation of the code. To do trouble shooting from a perspective, I hope to help you better understand the TiDB source code.
First, let's reproduce the failed scenario (TiDB 5.1.1 on MacOS) and create a simple table with only one field:
CREATE TABLE test(name VARCHAR(1) PRIMARY KEY);
Yes on MySQL, but not on TiDB, an error is reported
DataSource.Error: An error happened while reading data from the provider: 'Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints.'
Details:
DataSourceKind=MySql
DataSourcePath=localhost:4000;test
Look at the last SQL run on the general log TiDB:
select COLUMN_NAME, ORDINAL_POSITION, IS_NULLABLE, DATA_TYPE, case when NUMERIC_PRECISION is null then null when DATA_TYPE in ('FLOAT', 'DOUBLE') then 2 else 10 end AS NUMERIC_PRECISION_RADIX, NUMERIC_PRECISION, NUMERIC_SCALE, CHARACTER_MAXIMUM_LENGTH, COLUMN_DEFAULT, COLUMN_COMMENT AS DESCRIPTION, COLUMN_TYPE from INFORMATION_SCHEMA.COLUMNS where table_schema = 'test' and table_name = 'test';
We use tiup to start a TiDB cluster, and use tiup client to execute this command, the tiup client will also report an error:
error: mysql: sql: Scan error on column index 4, name "NUMERIC_PRECISION_RADIX": converting NULL to int64 is unsupported
Then our attention is focused on solving the problem of this sentence, let's first look at what the error reported by the tiup client means. The tiup client uses the golang xo/usql
library, but in the xo/usql
library, we cannot find the corresponding error message. The grep converting keyword returns extremely limited and irrelevant content. Let's take a look at the mysql driver of xo/usql
go-sql-driver/mysql
, download its code and grep converting, only a piece of information in the changelog is returned, and there is a high probability that the place where the error is reported is not in this library. go-sql-driver/mysql
look at database/sql
and find that it depends on 061b01fbbe61f3, then let’s take a look at the content of database/sql
database/sql
is the standard library of golang, so we need to download the source code of golang. In the database directory of golang grep converting, I quickly found the content that matched the error message:
go/src/database/sql/convert.go
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
if src == nil {
return fmt.Errorf("converting NULL to %s is unsupported", dv.Kind())
}
s := asString(src)
i64, err := strconv.ParseInt(s, 10, dv.Type().Bits())
if err != nil {
err = strconvErr(err)
return fmt.Errorf("converting driver.Value type %T (%q) to a %s: %v", src, s, dv.Kind(), err)
}
dv.SetInt(i64)
return nil
Let's trace this snippet again to see how the types here come from, and eventually we will return to go-sql-driver/mysql:
mysql/fields.go
case fieldTypeLongLong:
if mf.flags&flagNotNULL != 0 {
if mf.flags&flagUnsigned != 0 {
return scanTypeUint64
}
return scanTypeInt64
}
return scanTypeNullInt
column definition in the return body of the parsing statement, converted to the type in golang. We can use mysql --host 127.0.0.1 --port 4000 -u root --column-type-info
to view the column metadata returned by the problematic SQL after connecting:
MySQL
Field 5: `NUMERIC_PRECISION_RADIX`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 3
Max_length: 0
Decimals: 0
Flags: BINARY NUM
TiDB
Field 5: `NUMERIC_PRECISION_RADIX`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 2
Max_length: 0
Decimals: 0
Flags: NOT_NULL BINARY NUM
It can be clearly seen that NUMERIC_PRECISION_RADIX
field in the error message of the tiup client has obvious problems on TiDB. This field is marked as NOT_NULL in the return body of TiDB. Obviously this is unreasonable because of this field. Obviously it can be NULL
, and the return value of MySQL also reflects this. So xo/usql
reported an error when processing the returned body. At this point, we have discovered why the client side reported an error. Next, we need to find why TiDB returns an incorrect column definition.
By TiDB Dev Guide we can know in a general process execution TiDB DQL statements, from the entrance of server/conn.go#clientConn.Run
looked down, all the way through server/conn.go#clientConn.dispatch
, server/conn.go#clientConn.handleQuery
, server/conn.go#clientConn.handleStmt
, server/driver_tidb.go#TiDBContext.ExecuteStmt
, session/session.go#session.ExecuteStmt
, executor/compiler.go#Compiler.Compile
, planner/optimize.go#Optimize
, planner/optimize.go#optimize
, planner/core/planbuilder.go#PlanBuilder.Build
, planner/core/logical_plan_builder.go#PlanBuilder.buildSelect
, in buildSelect
, we can see a series of processing of query statements by TiDB planner, and then we can go to planner/core/expression_rewriter.go#PlanBuilder.rewriteWithPreprocess
and planner/core/expression_rewriter.go#PlanBuilder.rewriteExprNode
. In rewriteExprNode
, the problematic field NUMERIC_PRECISION_RADIX
be parsed, and finally this CASE
expression will be parsed. In expression/builtin_control.go#caseWhenFunctionClass.getFunction
, we finally came to the place to calculate the column definition returned by the CASE expression (this depends on the AST parsed by the traversal compiler):
for i := 1; i < l; i += 2 {
fieldTps = append(fieldTps, args[i].GetType())
decimal = mathutil.Max(decimal, args[i].GetType().Decimal)
if args[i].GetType().Flen == -1 {
flen = -1
} else if flen != -1 {
flen = mathutil.Max(flen, args[i].GetType().Flen)
}
isBinaryStr = isBinaryStr || types.IsBinaryStr(args[i].GetType())
isBinaryFlag = isBinaryFlag || !types.IsNonBinaryStr(args[i].GetType())
}
if l%2 == 1 {
fieldTps = append(fieldTps, args[l-1].GetType())
decimal = mathutil.Max(decimal, args[l-1].GetType().Decimal)
if args[l-1].GetType().Flen == -1 {
flen = -1
} else if flen != -1 {
flen = mathutil.Max(flen, args[l-1].GetType().Flen)
}
isBinaryStr = isBinaryStr || types.IsBinaryStr(args[l-1].GetType())
isBinaryFlag = isBinaryFlag || !types.IsNonBinaryStr(args[l-1].GetType())
}
fieldTp := types.AggFieldType(fieldTps)
// Here we turn off NotNullFlag. Because if all when-clauses are false,
// the result of case-when expr is NULL.
types.SetTypeFlag(&fieldTp.Flag, mysql.NotNullFlag, false)
tp := fieldTp.EvalType()
if tp == types.ETInt {
decimal = 0
}
fieldTp.Decimal, fieldTp.Flen = decimal, flen
if fieldTp.EvalType().IsStringKind() && !isBinaryStr {
fieldTp.Charset, fieldTp.Collate = DeriveCollationFromExprs(ctx, args...)
if fieldTp.Charset == charset.CharsetBin && fieldTp.Collate == charset.CollationBin {
// When args are Json and Numerical type(eg. Int), the fieldTp is String.
// Both their charset/collation is binary, but the String need a default charset/collation.
fieldTp.Charset, fieldTp.Collate = charset.GetDefaultCharsetAndCollate()
}
} else {
fieldTp.Charset, fieldTp.Collate = charset.CharsetBin, charset.CollationBin
}
if isBinaryFlag {
fieldTp.Flag |= mysql.BinaryFlag
}
// Set retType to BINARY(0) if all arguments are of type NULL.
if fieldTp.Tp == mysql.TypeNull {
fieldTp.Flen, fieldTp.Decimal = 0, types.UnspecifiedLength
types.SetBinChsClnFlag(fieldTp)
}
Looking at the code for calculating the column definition flag above, we can find that no matter what the CASE
expression is, the NOT_NULL
flag bit will definitely be set to false
, so the problem does not appear here! At this time, we can only look back along the code path above to see if the column definition generated above has been modified in the future. Finally in server/conn.go#clientConn.handleStmt
, it was found that it called server/conn.go#clientConn.writeResultSet
, and then successively called server/conn.go#clientConn.writeChunks
, server/conn.go#clientConn.writeColumnInfo
, server/column.go#ColumnInfo.Dump
and server/column.go#dumpFlag
. In dumpFlag, the previously generated column definition flag
was modified:
func dumpFlag(tp byte, flag uint16) uint16 {
switch tp {
case mysql.TypeSet:
return flag | uint16(mysql.SetFlag)
case mysql.TypeEnum:
return flag | uint16(mysql.EnumFlag)
default:
if mysql.HasBinaryFlag(uint(flag)) {
return flag | uint16(mysql.NotNullFlag)
}
return flag
}
}
Finally, we found the reason why TiDB returned the wrong column definition! In fact, this bug has been fixed in the latest TiDB version 5.2.0: *: fix some problems related to notNullFlag by wjhuang2016 · Pull Request #27697 · pingcap/tidb .
Finally, in the above process of reading the code, it is actually best for us to see what the AST parsed by TiDB looks like, so that we will not be blinded during the final traversal of the AST. parser chapter in the TiDB dev guide that explains how to debug the parser, parser/quickstart.md at master · pingcap/parser There is also a sample output generated AST, but simply outputting davecgh/go-spew
basically has no direct effect. Output the node generated by the parser, so that you can get a comprehensible tree:
package main
import (
"fmt"
"github.com/pingcap/parser"
"github.com/pingcap/parser/ast"
_ "github.com/pingcap/parser/test_driver"
"github.com/davecgh/go-spew/spew"
)
func parse(sql string) (*ast.StmtNode, error) {
p := parser.New()
stmtNodes, _, err := p.Parse(sql, "", "")
if err != nil {
return nil, err
}
return &stmtNodes[0], nil
}
func main() {
spew.Config.Indent = " "
astNode, err := parse("SELECT a, b FROM t")
if err != nil {
fmt.Printf("parse error: %v\n", err.Error())
return
}
fmt.Printf("%s\n", spew.Sdump(*astNode))
}
(*ast.SelectStmt)(0x140001dac30)({
dmlNode: (ast.dmlNode) {
stmtNode: (ast.stmtNode) {
node: (ast.node) {
text: (string) (len=18) "SELECT a, b FROM t"
}
}
},
resultSetNode: (ast.resultSetNode) {
resultFields: ([]*ast.ResultField) <nil>
},
SelectStmtOpts: (*ast.SelectStmtOpts)(0x14000115bc0)({
Distinct: (bool) false,
SQLBigResult: (bool) false,
SQLBufferResult: (bool) false,
SQLCache: (bool) true,
SQLSmallResult: (bool) false,
CalcFoundRows: (bool) false,
StraightJoin: (bool) false,
Priority: (mysql.PriorityEnum) 0,
TableHints: ([]*ast.TableOptimizerHint) <nil>
}),
Distinct: (bool) false,
From: (*ast.TableRefsClause)(0x140001223c0)({
node: (ast.node) {
text: (string) ""
},
TableRefs: (*ast.Join)(0x14000254100)({
node: (ast.node) {
text: (string) ""
},
resultSetNode: (ast.resultSetNode) {
resultFields: ([]*ast.ResultField) <nil>
},
Left: (*ast.TableSource)(0x14000156480)({
node: (ast.node) {
text: (string) ""
},
Source: (*ast.TableName)(0x1400013a370)({
node: (ast.node) {
text: (string) ""
},
resultSetNode: (ast.resultSetNode) {
resultFields: ([]*ast.ResultField) <nil>
},
Schema: (model.CIStr) ,
Name: (model.CIStr) t,
DBInfo: (*model.DBInfo)(<nil>),
TableInfo: (*model.TableInfo)(<nil>),
IndexHints: ([]*ast.IndexHint) <nil>,
PartitionNames: ([]model.CIStr) {
}
}),
AsName: (model.CIStr)
}),
Right: (ast.ResultSetNode) <nil>,
Tp: (ast.JoinType) 0,
On: (*ast.OnCondition)(<nil>),
Using: ([]*ast.ColumnName) <nil>,
NaturalJoin: (bool) false,
StraightJoin: (bool) false
})
}),
Where: (ast.ExprNode) <nil>,
Fields: (*ast.FieldList)(0x14000115bf0)({
node: (ast.node) {
text: (string) ""
},
Fields: ([]*ast.SelectField) (len=2 cap=2) {
(*ast.SelectField)(0x140001367e0)({
node: (ast.node) {
text: (string) (len=1) "a"
},
Offset: (int) 7,
WildCard: (*ast.WildCardField)(<nil>),
Expr: (*ast.ColumnNameExpr)(0x14000254000)({
exprNode: (ast.exprNode) {
node: (ast.node) {
text: (string) ""
},
Type: (types.FieldType) unspecified,
flag: (uint64) 8
},
Name: (*ast.ColumnName)(0x1400017dc70)(a),
Refer: (*ast.ResultField)(<nil>)
}),
AsName: (model.CIStr) ,
Auxiliary: (bool) false
}),
(*ast.SelectField)(0x14000136840)({
node: (ast.node) {
text: (string) (len=1) "b"
},
Offset: (int) 10,
WildCard: (*ast.WildCardField)(<nil>),
Expr: (*ast.ColumnNameExpr)(0x14000254080)({
exprNode: (ast.exprNode) {
node: (ast.node) {
text: (string) ""
},
Type: (types.FieldType) unspecified,
flag: (uint64) 8
},
Name: (*ast.ColumnName)(0x1400017dce0)(b),
Refer: (*ast.ResultField)(<nil>)
}),
AsName: (model.CIStr) ,
Auxiliary: (bool) false
})
}
}),
GroupBy: (*ast.GroupByClause)(<nil>),
Having: (*ast.HavingClause)(<nil>),
WindowSpecs: ([]ast.WindowSpec) <nil>,
OrderBy: (*ast.OrderByClause)(<nil>),
Limit: (*ast.Limit)(<nil>),
LockTp: (ast.SelectLockType) none,
TableHints: ([]*ast.TableOptimizerHint) <nil>,
IsAfterUnionDistinct: (bool) false,
IsInBraces: (bool) false,
QueryBlockOffset: (int) 0,
SelectIntoOpt: (*ast.SelectIntoOption)(<nil>)
})
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。