头图

0 Origin

Just in September of this year, the departmental platform project I am responsible for released a new version, which also launched a new function, which is similar to a timed task in short. Everything was normal on the first day, but a very small number of tasks were not executed normally on the second day (tasks that had been suspended continued to execute, but normal tasks were not executed).

The problem caused me and another colleague’s first reaction that there was a problem with the logic of the timing task execution. But we spent a lot of time to DEBUG, test, we found the fundamental problem is not in the function logic, but some been on the line a year and the underlying common code untouched . The core of this code is gob , the protagonist of this article. The source of the problem is a feature go zero value .

In the following article, I will use a more simplified example to describe this bug.

1 gob and zero value

Let me briefly introduce gob and the zero value.

1.1 Zero value

Zero value is a feature of Go language. Simply put: some variables that have not been assigned a value. For example, the following code:

package main

import (
    "fmt"
)

type person struct {
    name   string
    gender int
    age    int
}

func main() {
    p := person{}
    var list []byte
    var f float32
    var s string
    var m map[string]int
    
    fmt.Println(list, f, s, m)
    fmt.Printf("%+v", p)
}

/* 结果输出
[] 0  map[]
{name: gender:0 age:0}
*/

The zero value does bring convenience to developers in many cases, but many people who do not like it think that the existence of zero value makes the code grammatically not rigorous and brings some uncertainty. For example, I will describe the problem in detail later.

1.2 gob

gob is the standard library that comes with the Go language, in encoding/gob . gob is go binary an abbreviation of 061ab271ccda10, so from its name, we can also guess that gob should be related to binary.

In fact gob is Go language unique in binary form serialization and de-serialization data format , similar in Python pickle . Its most common usage is to serialize an object (structure) and store it in a disk file, then read the file and deserialize it when it needs to be used, so as to achieve the effect of object persistence.

I will not cite an example, and this article is not a gob on the use of 061ab271ccda61. This is its official document , friends who are not familiar with the usage gob Example part of the document, or directly look at the examples used in the problem described later.

2 questions

2.1 Requirements

At the beginning of this article, I briefly described the origin of the problem, and here I use a simpler model to expand the description.

First we define a structure person

type person struct {
    // 和 json 库一样,字段首字母必须大写(公有)才能序列化
    ID     int
    Name   string // 姓名
    Gender int    // 性别:男 1,女 0
    Age    int    // 年龄
}

Around this structure, we will enter several personnel information, each of which is a person object. But for some reasons, we must use gob these personnel information to the local disk instead of using a database such as MySQL.

Next, we have such a requirement:

traverses and deserializes the gob files stored locally, and then judges the number of male and female genders and counts them.

2.2 Code

According to the above requirements and background, the code is as follows (in order to save space, codes such as package, import, init() are omitted here):

  • defines.go
// .gob 文件所在目录
const DIR = "./persons"

type person struct {
    // 和 json 库一样,字段首字母必须大写(公有)才能序列化
    ID     int
    Name   string // 姓名
    Gender int    // 性别:男 1,女 0
    Age    int    // 年龄
}

// 需要持久化的对象们
var persons = []person{
    {0, "Mia", 0, 21},
    {1, "Jim", 1, 18},
    {2, "Bob", 1, 25},
    {3, "Jenny", 0, 16},
    {4, "Marry", 0, 30},
}
  • serializer.go
// serialize 将 person 对象序列化后存储到文件,
// 文件名为 ./persons/${p.id}.gob
func serialize(p person) {
    filename := filepath.Join(DIR, fmt.Sprintf("%d.gob", p.ID))
    buffer := new(bytes.Buffer)
    encoder := gob.NewEncoder(buffer)
    _ = encoder.Encode(p)
    _ = ioutil.WriteFile(filename, buffer.Bytes(), 0644)
}

// unserialize 将 .gob 文件反序列化后存入指针参数
func unserialize(path string, p *person) {
    raw, _ := ioutil.ReadFile(path)
    buffer := bytes.NewBuffer(raw)
    decoder := gob.NewDecoder(buffer)
    _ = decoder.Decode(p)
}
  • main.go
func main() {
    storePersons()
    countGender()
}

func storePersons() {
    for _, p := range persons {
        serialize(p)
    }
}

func countGender() {
    counter := make(map[int]int)
    // 用一个临时指针去作为文件中对象的载体,以节省新建对象的开销。
    tmpP := &person{}
    for _, p := range persons {
        // 方便起见,这里直接遍历 persons ,但只取 ID 用于读文件
        id := p.ID
        filename := filepath.Join(DIR, fmt.Sprintf("%d.gob", id))
        // 反序列化对象到 tmpP 中
        unserialize(filename, tmpP)
        // 统计性别
        counter[tmpP.Gender]++
    }
    fmt.Printf("Female: %+v, Male: %+v\n", counter[0], counter[1])
}

After executing the code, we got this result:

// 对象们
var persons = []person{
    {0, "Mia", 0, 21},
    {1, "Jim", 1, 18},
    {2, "Bob", 1, 25},
    {3, "Jenny", 0, 16},
    {4, "Marry", 0, 30},
}

// 结果输出
Female: 1, Male: 4

Um? 1 female, 4 males? BUG appeared, this result obviously does not match our preset data. What went wrong?

2.3 Positioning

We add a line of print statements to the for loop in the countGender() person object read each time, and then get this result:

// 添加行
fmt.Printf("%+v\n", tmpP)

// 结果输出
&{ID:0 Name:Mia Gender:0 Age:21}
&{ID:1 Name:Jim Gender:1 Age:18}
&{ID:2 Name:Bob Gender:1 Age:25}
&{ID:3 Name:Jenny Gender:1 Age:16}
&{ID:4 Name:Marry Gender:1 Age:30}

Good guys, Jenny and Marry have become men! But the magic is that, except for Gender , all other data are normal! Seeing this result, if you are like me and often deal with configuration files such as JSON and Yml, you may take it for granted that read normally, and there should be a problem with the storage of .

But the gob file is a binary file, and it is difficult for us to verify it with the naked eye like a JSON file. Even if you use xxd under Linux, you can only get such an ambiguous output:

>$ xxd persons/1.gob 
0000000: 37ff 8103 0101 0670 6572 736f 6e01 ff82  7......person...
0000010: 0001 0401 0249 4401 0400 0104 4e61 6d65  .....ID.....Name
0000020: 010c 0001 0647 656e 6465 7201 0400 0103  .....Gender.....
0000030: 4167 6501 0400 0000 0eff 8201 0201 034a  Age............J
0000040: 696d 0102 0124 00                        im...$.

>$ xxd persons/0.gob 
0000000: 37ff 8103 0101 0670 6572 736f 6e01 ff82  7......person...
0000010: 0001 0401 0249 4401 0400 0104 4e61 6d65  .....ID.....Name
0000020: 010c 0001 0647 656e 6465 7201 0400 0103  .....Gender.....
0000030: 4167 6501 0400 0000 0aff 8202 034d 6961  Age..........Mia
0000040: 022a 00                                  .*.

Maybe we can try to hard parse these binary files to compare the differences between them; or deserialize two identical objects except Gender to the gob file and compare them. If you are interested, you can try it. At that time, due to time constraints and other reasons, we did not try this approach, but modified the data to continue testing.

2.4 Law

Since the two data in question above are both women, the programmer's intuition tells me this may not be a coincidence. So I tried to modify the order of the data to completely separate men and women, and then tested:

// 第一组,先女后男
var persons = []person{
    {0, "Mia", 0, 21},
    {3, "Jenny", 0, 16},
    {4, "Marry", 0, 30},
    {1, "Jim", 1, 18},
    {2, "Bob", 1, 25},
}

// 结果输出
&{ID:0 Name:Mia Gender:0 Age:21}
&{ID:3 Name:Jenny Gender:0 Age:16}
&{ID:4 Name:Marry Gender:0 Age:30}
&{ID:1 Name:Jim Gender:1 Age:18}
&{ID:2 Name:Bob Gender:1 Age:25}
// 第二组,先男后女
var persons = []person{
    {1, "Jim", 1, 18},
    {2, "Bob", 1, 25},
    {0, "Mia", 0, 21},
    {3, "Jenny", 0, 16},
    {4, "Marry", 0, 30},
}

// 结果输出
&{ID:1 Name:Jim Gender:1 Age:18}
&{ID:2 Name:Bob Gender:1 Age:25}
&{ID:2 Name:Mia Gender:1 Age:21}
&{ID:3 Name:Jenny Gender:1 Age:16}
&{ID:4 Name:Marry Gender:1 Age:30}

The paradox appears. When the first female is the first male, everything is normal; when the first male is the female first, the males are normal, and the females are all abnormal. Even the ID of Mia, which was originally 0, has become 2 here!

After repeated tests and observations on the result set, we came to a regular conclusion: all male data are normal, and the problem is all female data!

Further formulating this conclusion is: If the previous data is a non-zero number, and the following data number is 0, the following 0 will be covered by the non-zero preceding it .

3 answers

Auditing the program code again, I noticed this sentence:

// 用一个临时指针去作为文件中对象的载体,以节省新建对象的开销。
tmpP := &person{}

In order to save the additional overhead of creating new objects, I used the same variable to load the data in the file cyclically and perform gender determination. Combining the BUG rules we found earlier, the answer seems to be here: so-called back data 0 is overwritten by the previous non-zero, it is probably because the same object is used to load the file, resulting in the previous data remaining .

The verification method is also very simple. You only need to put that common object in the for loop below, so that each loop will recreate an object for loading file data to cut off the influence of the previous data.

Let's modify the code (redundant parts are omitted):

for _, p := range persons {
    // ...
    tmpP := &person{}
    // ...
}

// 结果输出
&{ID:0 Name:Mia Gender:0 Age:21}
&{ID:1 Name:Jim Gender:1 Age:18}
&{ID:2 Name:Bob Gender:1 Age:25}
&{ID:3 Name:Jenny Gender:0 Age:16}
&{ID:4 Name:Marry Gender:0 Age:30}
Female: 3, Male: 2

correct!

The result is indeed as we thought, which is the reason for the residual data. But here is another question: Why is the data read by the old method normal when 0 is first and then not 0 (female first)? And, except 0 will be affected, other numbers (age) will not be affected?

All the questions now seem to point to the special number 0

Until this time, the characteristic of zero value was finally noticed by us. So I quickly read gob library official document , I found this sentence:

If a field has the zero value for its type (except for arrays; see above), it is omitted from the transmission.

translate:

If the type of a field has a zero value (except for arrays), it will be omitted in the transmission.

This sentence before and after the text is saying struct , so here is field refers to is the structure of the field, in line with our example text.

Based on the conclusions we got earlier and the description of the official documents, we can finally draw a complete conclusion now:

gob library ignores the zero values outside the array when manipulating data. And our code uses a public object to load the file data at the beginning. Since the zero value is not transmitted, the zero-valued field in the original data will not be read. What we see is actually the previous non-zero value. Object data.

The solution is also very simple, just what I did above, just don't use public objects to load.

4 review

In the project bug I described at the beginning of the article, I used 0 and 1 to indicate the status of a scheduled task (paused, running). Just like person.Gender above, different tasks are interfered with because of the zero value problem, which causes the task to perform abnormally, and the other fields that do not involve the zero value are all normal. Although it is an online production environment, fortunately, the problems were discovered early and dealt with in a timely manner, without causing any production accidents. But the whole process and the final answer are deeply imprinted in my mind.

Later, my colleague and I briefly discussed why gob chose to ignore the zero value? From my point of view, it may be to save space. And the code we wrote at the beginning created a public object to save space. As a result, the two space-saving logics eventually collided with a hidden BUG.


程序员小杜
1.3k 声望37 粉丝

会写 Python,会写 Go,正在学 Rust。