头图

background

In addition to the major design of generics introduced in Go 1.18, the Go official team also introduced fuzzing fuzzing in the Go 1.18 toolchain.

The main developers of Go fuzzing are Katie Hockman, Jay Conrod and Roland Shoemaker.

Editor's Note : Katie Hockman has left Google on 2022.02.19, and Jay Conrod has also left Google in October 2021.

What is Fuzzing

Fuzzing means fuzzing in Chinese. It is an automated testing technology that can randomly generate test data sets, and then call the function code to be tested to check whether the function meets the expectations.

Fuzz testing is a complement to unit testing, not a replacement for unit testing.

Unit testing is to check whether the results obtained from the specified input are consistent with the expected output results, and the test data set is relatively limited.

Fuzzing can generate random test data, find scenarios that cannot be covered by unit tests, and then discover potential bugs and security vulnerabilities in the program.

How to use Go Fuzzing

Fuzzing is not a new concept in the Go language. Before the official Go team released Go Fuzzing, there was already a similar fuzzing tool go-fuzz on GitHub.

The Fuzzing implementation of the Go official team draws on the design ideas of go-fuzz.

Go 1.18 integrates Fuzzing into the go test toolchain and testing package.

Example

Here is an example to illustrate how Fuzzing is used.

For the following string inversion function Reverse , you can think about the potential problems of this code?

 // main.go
package fuzz

func Reverse(s string) string {
    bs := []byte(s)
    length := len(bs)
    for i := 0; i < length/2; i++ {
        bs[i], bs[length-i-1] = bs[length-i-1], bs[i]
    }
    return string(bs)
}

Writing Fuzzing Fuzzing Tests

If no bugs in the above code are found, we might as well write a Fuzzing fuzzing function to find potential problems in the above code.

The syntax of the Go Fuzzing fuzzing function is as follows:

  • The fuzzing function is defined in the xxx_test.go file, which is the same as Go's existing unit test and benchmark test.
  • Function names Fuzz beginning, parameter is * testing.F type, testing.F type has two important ways Add and Fuzz .
  • Add method is used to add seed corpus data, and the bottom layer of Fuzzing can automatically generate random test data according to the seed corpus data.
  • Fuzz The method receives a variable of function type as a parameter. The first parameter of the function type must be the type of *testing.T , and the rest of the parameter types are the same as the Add method. The type of the actual parameter passed in remains the same. For example, in the following example, f.Add(5, "hello") the first argument passed in is 5 , the second argument is hello , which corresponds to i int and s string .

  • The bottom layer of Go Fuzzing will randomly generate test data and perform fuzzing based on the seed corpus specified in Add . For example in the example above, based Add specified in 5 and hello , the production of new random test data, assigned to i and s , and then keep calling the function as the argument of the f.Fuzz method, that is, func(t *testing.T, i int, s string){...} .

After knowing the above rules, let's write a fuzzing function as follows for the Reverse function.

 // fuzz_test.go
package fuzz

import (
    "testing"
    "unicode/utf8"
)

func FuzzReverse(f *testing.F) {
    str_slice := []string{"abc", "bb"}
    for _, v := range str_slice {
        f.Add(v)
    }
    f.Fuzz(func(t *testing.T, str string) {
        rev_str1 := Reverse(str)
        rev_str2 := Reverse(rev_str1)
        if str != rev_str2 {
            t.Errorf("fuzz test failed. str:%s, rev_str1:%s, rev_str2:%s", str, rev_str1, rev_str2)
        }
        if utf8.ValidString(str) && !utf8.ValidString(rev_str1) {
            t.Errorf("reverse result is not utf8. str:%s, len: %d, rev_str1:%s", str, len(str), rev_str1)
        }
    })
}

Run Fuzzing Tests

The version of Go used is required to be go 1.18beta 1 or above. The Fuzzing test can be performed by executing the following command, and the results are as follows:

 $ go1.18beta1 test -v -fuzz=Fuzz
fuzz: elapsed: 0s, gathering baseline coverage: 0/111 completed
fuzz: minimizing 60-byte failing input file
fuzz: elapsed: 0s, gathering baseline coverage: 5/111 completed
--- FAIL: FuzzReverse (0.04s)
    --- FAIL: FuzzReverse (0.00s)
        fuzz_test.go:20: reverse result is not utf8. str:æ, len: 2, rev_str1:��
    
    Failing input written to testdata/fuzz/FuzzReverse/ce9e8c80e2c2de2c96ab9e63b1a8cf18cea932b7d8c6c9c207d5978e0f19027a
    To re-run:
    go test -run=FuzzReverse/ce9e8c80e2c2de2c96ab9e63b1a8cf18cea932b7d8c6c9c207d5978e0f19027a
FAIL
exit status 1
FAIL    example/fuzz    0.179s

Focus on fuzz_test.go:20: reverse result is not utf8. str:æ, len: 2, rev_str1:��

In this example, a random string æ is a UTF-8 string composed of 2 bytes. After inversion according to the Reverse function, we get A non-UTF-8 string �� .

So the function that we implemented before to reverse the string by bytes Reverse has a bug. This function can correctly reverse the string composed of characters in ASCII code, but for non-ASCII If the characters in the code are simply reversed according to the bytes, the result may be an illegal string.

Interested friends, you can see what the result will be if you call the Reverse function on the string "eat".

Note : If Go Fuzzing finds your bug during running, it will write the corresponding input data to the testdata/fuzz/FuzzXXX directory. For example, in the above example, the output of go1.18beta1 test -v -fuzz=Fuzz prints the following content: Failing input written to testdata/fuzz/FuzzReverse/ce9e8c80e2c2de2c96ab9e63b1a8cf18cea932b7d8c6c9c207d5978e0f19027a , which means that the test input is written to the corpus file testdata/fuzz/FuzzReverse/xxx .

The underlying mechanism of Go Fuzzing

go test When executed, it will first compile and generate an executable file for each tested package, and then run the executable file to get the corresponding package TestXXX and BenchmarkXXX test results. Go Fuzzing operates in a similar pattern to this one, but with a few differences.

When go test is executed, if there is a -fuzz flag, go test will be combined with the coverage tool to compile and generate an executable file for fuzzing. Most of the fuzzing logic is implemented in internal/fuzz .

When go test compiles and generates an executable file, the executable file will run, and the running process is called the coordinator process. There are go test most tags of the command in the startup parameters of the coordination process, including -fuzz=pattern this tag, -fuzz=pattern used to identify which fuzz test function (fuzz test) Do a fuzzing test.

Currently, for each go test -fuzz=pattern call, only one fuzzer function is supported to match. If go test -fuzz=pattern can match multiple FuzzXXX functions, the following error will be reported:

 $ go1.18beta1 test -v -fuzz=Fuzz
testing: will not fuzz, -fuzz matches more than one fuzz test: [FuzzReverse FuzzReverse2]
FAIL
exit status 1
FAIL    example/fuzz    0.752s

After the coordination process is started, the main program logic is fuzz.CoordinateFuzzing . fuzz.CoordinateFuzzing will initialize the fuzzing system and open the coordinator event loop.

The coordinator process starts multiple worker processes, each of which runs the same executable program as the coordinator process, and the real fuzzing fuzzing is done by the worker process. The worker process starts with a flag parameter -test.fuzzworker , indicating that this is a worker process. The number of worker processes started is equal to GOMAXPROCS.

Here I give an example, you can run go test -fuzz=pattern in the process of executing ps aux | grep fuzz to view the current fuzzing related processes.

 $ ps aux | grep fuzz
xxx    13913  84.3  1.0  5219184  85124 s001  R+   10:12下午   0:03.90 /var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build1953131131/b001/fuzz.test -test.fuzzworker -test.paniconexit0 -test.fuzzcachedir=/Users/xxx/Library/Caches/go-build/fuzz/example/fuzz -test.timeout=10m0s -test.fuzz=Fuzz
xxx    13910  81.9  1.0  5221180  86200 s001  R+   10:12下午   0:03.94 /var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build1953131131/b001/fuzz.test -test.fuzzworker -test.paniconexit0 -test.fuzzcachedir=/Users/xxx/Library/Caches/go-build/fuzz/example/fuzz -test.timeout=10m0s -test.fuzz=Fuzz
xxx    13912  78.3  1.0  5219964  84984 s001  R+   10:12下午   0:03.86 /var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build1953131131/b001/fuzz.test -test.fuzzworker -test.paniconexit0 -test.fuzzcachedir=/Users/xxx/Library/Caches/go-build/fuzz/example/fuzz -test.timeout=10m0s -test.fuzz=Fuzz
xxx    13911  74.5  1.0  5219184  85132 s001  R+   10:12下午   0:03.76 /var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build1953131131/b001/fuzz.test -test.fuzzworker -test.paniconexit0 -test.fuzzcachedir=/Users/xxx/Library/Caches/go-build/fuzz/example/fuzz -test.timeout=10m0s -test.fuzz=Fuzz
xxx    13907  43.3  2.3  5944576 191172 s001  R+   10:12下午   0:01.90 /var/folders/pv/_x849j6n22x37xxd9cstgwkr0000gn/T/go-build1953131131/b001/fuzz.test -test.paniconexit0 -test.fuzzcachedir=/Users/xxx/Library/Caches/go-build/fuzz/example/fuzz -test.timeout=10m0s -test.fuzz=Fuzz
xxx    13923   0.0  0.0  4268176    420 s000  R+   10:12下午   0:00.00 grep fuzz
xxx    13891   0.0  0.2  5014396  16868 s001  S+   10:12下午   0:00.52 /Users/xxx/sdk/go1.18beta2/bin/go test -fuzz=Fuzz
xxx    13890   0.0  0.0  4989312   4008 s001  S+   10:12下午   0:00.01 go1.18beta2 test -fuzz=Fuzz

If the worker process crashes while running fuzzing, the coordinator process can record the test data that caused the worker process to crash. If it is directly handed over to the coordinator process to perform fuzzing, when it encounters an input that will cause the program to crash, the coordinator process itself will crash, and there is no way to record the input that causes the program to crash (Failing input). The model run by Go Fuzzing looks like this:

Diagram showing the relationship between fuzzing processes. At the top is a box showing "go test (cmd/go)". An arrow points downward to a box labelled "coordinator (test binary)". From that, three arrows point downward to three boxes labelled "worker (test binary)".

The coordinator process and the worker process communicate through a pair of pipes, using the JSON-based RPC communication protocol. This protocol is very streamlined because we don't need a complex RPC protocol like gRPC, and we don't want to introduce any new dependencies to the Go standard library.

Each worker process saves its own state in the mmap file, which is shared with the coordinator process. In most cases, mmap records only the number of iterations and the state of the random number generator. If the worker process crashes, the coordinator process can restore its state from shared memory without the worker process sending messages through the pipe.

The entire Fuzzing process is divided into 3 stages:

Diagram showing communication between coordinator and worker. Two arrows point down: the left is labelled "coordinator", the right is labelled "worker". Three pairs of horizontal arrows point from the coordinator to the worker and back. The top pair is labelled "baseline coverage", the middle is labelled "fuzz", the bottom is labelled "minimize".

Stage 1: Baseline coverage

When the coordinator process starts, it will pull up the worker process. The coordinator process will send the seed corpus (including the test data added in f.Add c396bc526fdc04fdecf6d18287933f13--- and the test input in the testdata/fuzz directory) and the fuzzing cache corpus (cache corpus, located in $GOCACHE subdirectory).

Each worker process runs the specified input, and then reports a snapshot of its coverage counter to the coordinator process, and the coordinator will combine the collected coverage data of the workers into a coverage array.

This phase is called the baseline coverage collection phase. The workers will only run the specified input sent to them by the coordinator, and will not generate random test data.

Stage 2: Fuzzing Fuzzing

At this stage, the coordinator process will send the seed corpus and cache corpus to the worker process again for real fuzzing.

Each worker process will receive a copy of the input data and baseline coverage array sent by the coordinator. The worker process will then randomly mutate the specified input to obtain new test data. There are many ways to mutate, it may be to invert the bit, change 0 to 1, change 1 to 0, or delete or add bytes, and so on. Then, the mutated data is given as a parameter to the fuzz target function to run.

In order to reduce the communication overhead between the coordinator process and the worker process, each worker process can mutate to obtain new test data within 100ms, and then call the fuzz target function without further input from the coordinator process.

After each call to the fuzz target function on the generated random data, the worker process checks two scenarios:

  • Whether new coverage data was found compared to the baseline coverage array.
  • Whether an error occurs, that is, the code executes T.Fail or T.FailNow . Note : T.Error , T.Errorf will automatically call T.Fail , T.Fatal and T.Fatalf will automatically call T.FailNow .

If one of the two is satisfied, the worker process will immediately send the input data to the coordinator process.

Stage 3: Minimization

If the coordinator process receives the input data sent by the worker process is scenario 1, that is, it receives input that will generate new coverage, the coordinator will compare the coverage data of this worker with the coverage array of the current combination.

Because it is possible that other workers have already found an input that will provide the same coverage, if so, the coordinator will ignore the input directly. If this new input does provide new coverage, the coordinator will send this input to a worker (probably a different worker) for minimization.

Minimization is a bit like fuzzing, but workers mutate randomly to create a smaller input that still yields new coverage. Smaller inputs generally make fuzzing faster, so it's worth spending time up front to make the fuzzing process faster. The worker process will report to the coordinator when it finishes minimizing, even if it fails to find a smaller input. The coordinator process will add this minimized input to the cache corpus and continue fuzzing. Later, the coordinator may send this minimized input to all workers for further fuzzing. This is how the fuzzing system automatically adjusts to find new coverage.

If the coordinator process receives the input data sent by the worker process is scenario 2: that is 引发error的输入 , the coordinator process will send this input to the worker again for minimization. In this scenario, the worker will try to find a smaller input that will raise an error, although not necessarily the same error. After the input data is minimized, the coordinator process will store the minimized data to testdata/fuzz/$FuzzTarget , gracefully shut down all worker processes, and then exit with a non-zero status.

If the worker process crashes during fuzzing, the coordinator process can use the input sent to the worker, the worker's RNG state, and the number of iterations (left in shared memory) to recover the input that caused the worker process to crash. The input to a crash is usually not minimized, because minimizing is a highly stateful process, and every crash destroys this state. Minimizing the crash-causing input is theoretically possible, but has not been implemented yet.

Fuzzing usually ends running in the following scenarios, otherwise it will keep running:

  • Fuzzing finds the error, which triggers the error condition in your fuzzing function
  • The user presses Ctrl-C to interrupt the program
  • The running time has reached the set time of -fuzztime

The fuzzing engine handles interrupts gracefully, regardless of whether the interrupt is sent to the coordinator process or the worker process. For example, if the worker process is interrupted while minimizing input, the coordinator process will save the input that was not minimized.

Precautions

  • The implementation of FuzzXXX is also placed in the go file ending with _test.go .
  • seed corpus (seed corpus): contains both the input specified by f.Add , and the input in the file under the testdata/fuzz/$FuzzTarget directory.
  • go test without -fuzz marking is performed by default TestXXX and FuzzXXX functions that begin with, for FuzzXXX only use Input from the seed corpus without generating random data. To generate random input, use go test -fuzz=pattern .

open source address

Articles and sample code are open sourced on GitHub: Beginner, Intermediate, and Advanced Tutorials in Go .

Official account: coding advanced. Follow the official account to get the latest Go interview questions and technology stacks.

Personal website: Jincheng's Blog .

References


coding进阶
116 声望18 粉丝