5
search [160c597ec41116 brain into the fried fish ] Follow this fried fish with liver fried. This article GitHub github.com/eddycjy/blog has been included, with my series of articles, materials and open source Go books.

Hello everyone, my name is fried fish .

A few days ago, when sharing an article raised by a Go group of friends, some readers mentioned in the circle of friends that I hoped that I could explain the leak of Goroutine. He was often asked during interviews.

Today's male protagonist is Goroutine, a well-known brand logo of the Go language, a big killer that can drive hundreds of thousands of fast cars into the lane at will.

    for {
        go func() {}()
    }

This article will focus on the N methods leaked by Goroutine for detailed explanations and explanations.

Why ask

Why would the interviewer ask Goroutine (coroutine) to reveal such peculiar questions?

It can be guessed that:

  • Goroutine is really too low to use the threshold, you can start with one at your fingertips, and there have been many abuses. For example: concurrent map.
  • Goroutine itself is widely used in the standard library, compound types, and underlying source code of the Go language. For example: HTTP Server processes each request as a coroutine to run.

When many Go projects have accidents online, the basic Goroutine connection, everyone will act as the fire-fighting team leader, go to see indicators, watch logs, and collect Goroutine operation through PProf.

Naturally, he is also the most watched "star", so in daily interviews, the chance of being asked is extremely high.

Goroutine leaked

After understanding the reasons why everyone loves to ask, we began to study the N methods that Goroutine leaked, hoping to understand the principles and avoid these problems through the "pits" left by the predecessors.

The reasons for the leaks are mostly concentrated in:

  • Channel/mutex and other read and write operations are going on in Goroutine, but due to logic problems, it will always be blocked in some cases.
  • The business logic in Goroutine enters an endless loop, and the resources cannot be released.
  • The business logic in the Goroutine enters a long wait, and there are constantly new Goroutines entering the wait.

Next, I will quote some examples of Goroutine leaks collected from surfing the Internet (the source will be referenced at the end of the article).

Improper use of channel

Goroutine+Channel is the most classic combination, so many leaks appear here.

The most classic is the logic problem in the channel read and write operations mentioned above.

Send not receive

The first example:

func main() {
    for i := 0; i < 4; i++ {
        queryAll()
        fmt.Printf("goroutines: %d\n", runtime.NumGoroutine())
    }
}

func queryAll() int {
    ch := make(chan int)
    for i := 0; i < 3; i++ {
        go func() { ch <- query() }()
        }
    return <-ch
}

func query() int {
    n := rand.Intn(100)
    time.Sleep(time.Duration(n) * time.Millisecond)
    return n
}

Output result:

goroutines: 3
goroutines: 5
goroutines: 7
goroutines: 9

In this example, we called the queryAll method multiple times, and used Goroutine to call the query method for The key point is that query method will be written into the ch variable, and then the ch variable will be returned after successful reception.

Finally, you can see that the number of output goroutines is constantly increasing, with 2 more goroutines each time. That is, every time it is called, Goroutine will be leaked.

The reason is that the channel has been sent (3 sent at a time), but the receiving end has not received it completely (only 1 ch is returned), which induced Goroutine leakage.

Receive not send

The second example:

func main() {
    defer func() {
        fmt.Println("goroutines: ", runtime.NumGoroutine())
    }()

    var ch chan struct{}
    go func() {
        ch <- struct{}{}
    }()
    
    time.Sleep(time.Second)
}

Output result:

goroutines:  2

In this example, it is the opposite of "sending or not receiving". The channel receives the value but does not send it, which will also cause congestion.

But in actual business scenarios, it is generally more complicated. Basically, in a lot of business logic, there is a problem with the read and write operations of a channel, and it is naturally blocked.

nil channel

The third example:

func main() {
    defer func() {
        fmt.Println("goroutines: ", runtime.NumGoroutine())
    }()

    var ch chan int
    go func() {
        <-ch
    }()
    
    time.Sleep(time.Second)
}

Output result:

goroutines:  2

In this example, it can be known that if the channel forgets to initialize, no matter whether you are reading or writing, it will cause blocking.

The normal initial posture is:

    ch := make(chan int)
    go func() {
        <-ch
    }()
    ch <- 0
    time.Sleep(time.Second)

Call make function to initialize.

Weird slow wait

Fourth example:

func main() {
    for {
        go func() {
            _, err := http.Get("https://www.xxx.com/")
            if err != nil {
                fmt.Printf("http.Get err: %v\n", err)
            }
            // do something...
    }()

    time.Sleep(time.Second * 1)
    fmt.Println("goroutines: ", runtime.NumGoroutine())
    }
}

Output result:

goroutines:  5
goroutines:  9
goroutines:  13
goroutines:  17
goroutines:  21
goroutines:  25
...

In this example, a classic accident scenario in the Go language is shown. That is, generally we will call the interface of the third-party service in the application.

However, third-party interfaces are sometimes very slow and do not return response results for a long time. It just so happens that the default http.Client in the Go language does not set a timeout period.

As a result, it will be blocked all the time. Goroutine will naturally continue to skyrocket and leak, which will eventually fill up resources and cause accidents.

In Go projects, we generally recommend to set a timeout period for http.Client

    httpClient := http.Client{
        Timeout: time.Second * 15,
    }

In addition, measures such as current limiting and fusing should be taken to prevent sudden traffic from causing dependency collapse and still eat P0.

Forgot to unlock the mutex

Fifth example:

func main() {
    total := 0
    defer func() {
        time.Sleep(time.Second)
        fmt.Println("total: ", total)
        fmt.Println("goroutines: ", runtime.NumGoroutine())
    }()

    var mutex sync.Mutex
    for i := 0; i < 10; i++ {
        go func() {
            mutex.Lock()
            total += 1
        }()
    }
}

Output result:

total:  1
goroutines:  10

In this example, the first mutex lock sync.Mutex locked, but he may be dealing with business logic, or he may have forgotten Unlock .

As a result, all the sync.Mutex wanted to be locked, but they were all blocked because they were not released. Generally in Go projects, we recommend the following writing:

    var mutex sync.Mutex
    for i := 0; i < 10; i++ {
        go func() {
            mutex.Lock()
            defer mutex.Unlock()
            total += 1
    }()
    }

Improper use of synchronization lock

Sixth example:

func handle(v int) {
    var wg sync.WaitGroup
    wg.Add(5)
    for i := 0; i < v; i++ {
        fmt.Println("脑子进煎鱼了")
        wg.Done()
    }
    wg.Wait()
}

func main() {
    defer func() {
        fmt.Println("goroutines: ", runtime.NumGoroutine())
    }()

    go handle(3)
    time.Sleep(time.Second)
}

In this example, we call the synchronous layout sync.WaitGroup to simulate the control variable that we will pass in the loop traversal from the outside.

However, due to wg.Add the number of wg.Done number does not match, so the call wg.Wait been blocking wait method.

For use in Go projects, we would suggest the following writing:

    var wg sync.WaitGroup
    for i := 0; i < v; i++ {
        wg.Add(1)
        defer wg.Done()
        fmt.Println("脑子进煎鱼了")
    }
    wg.Wait()

Troubleshooting method

We can call the runtime.NumGoroutine method to get the number of Goroutines running, and compare before and after, we can know if there is any leakage.

However, in the operating scenarios of business services, most of the leaks caused by Goroutine are in production and test environments, so PProf is more commonly used:

import (
    "net/http"
     _ "net/http/pprof"
)

http.ListenAndServe("localhost:6060", nil))

As long as we call http://localhost:6060/debug/pprof/goroutine?debug=1 , PProf will return a list of all Goroutines with stack traces.

You can also use other features of PProf for comprehensive viewing and analysis. This section refers to the "Performance Analysis of Go Killer PProf" I wrote before, which is basically the most comprehensive tutorial in the village.

to sum up

In today's article, we analyzed the N common ways and methods leaked by Goroutine, although they all seem to be relatively basic scenarios.

However, when combined with the actual business code, it is a certain detail in a large pile that caused all of them to be lost. I hope that the above few cases can bring everyone's vigilance.

The interviewer loves to ask, I am afraid that I have not stepped on many pits, but also hope that the colleagues who come in are also experienced.

Reliable engineers, not just stereotyped engineers.

If you have any questions please comment and feedback exchange area, best relationship is mutual achievement , everybody thumbs is fried fish maximum power of creation, thanks for the support.

The article is continuously updated, and you can search on [My brain is fried fish] to read, reply [160c597ec418a5 000 ] I have prepared the first-line interview algorithm questions and information; this article GitHub github.com/eddycjy/blog has been included , Welcome Star to urge you to update.

煎鱼
8.4k 声望12.8k 粉丝