2

Hello everyone, I am fried fish.

Some time ago, I shared the " Go program crashed?" Fried fish teaches you to use PProf tool to fight fire! ", but sometimes using pprof alone may not be enough to fully observe and solve the problem, because the real program also contains many hidden actions, such as:

  • What does Goroutine do when it is executed?
  • How long has the Goroutine been executed/blocked?
  • When was Syscall blocked? Where was it blocked?
  • Who locked/unlocked Goroutine again?
  • How does GC affect the execution of Goroutine?

These things are difficult to analyze with pprof, but if you want to know the answers above, you can use the protagonist go tool trace in this chapter to open the door to a new world.

Let's happily start the road of fish-sucking together.

Initial understanding

import (
 "os"
 "runtime/trace"
)

func main() {
 trace.Start(os.Stderr)
 defer trace.Stop()

 ch := make(chan string)
 go func() {
  ch <- "Go语言编程之旅"
 }()

 <-ch
}

Generate the trace file:

$ go run main.go 2> trace.out

Start the visual interface:

$ go tool trace trace.out
2019/06/22 16:14:52 Parsing trace...
2019/06/22 16:14:52 Splitting trace...
2019/06/22 16:14:52 Opening browser. Trace viewer is listening on http://127.0.0.1:57321

View the visual interface:

图片

  • View trace: View trace
  • Goroutine analysis: Goroutine analysis
  • Network blocking profile: network blocking profile
  • Synchronization blocking profile: Synchronization blocking profile
  • Syscall blocking profile: system call blocking profile
  • Scheduler latency profile: Scheduler latency profile
  • User defined tasks: user-defined tasks
  • User defined regions: User defined regions
  • Minimum mutator utilization: minimum Mutator utilization

Overview of scheduling delays

When you first look at the problem, you shouldn’t get caught up in the details at the beginning, unless it’s obvious.

Therefore, we generally check the "Scheduler latency profile" first, and we can see the overall call overhead through Graph, as follows:

图片

The demo program is relatively simple, so here are two pieces, one is trace itself, and the other is the transceiver channel

Goroutine analysis

The second step is to look at "Goroutine analysis". Through this function, we can see how many Goroutines are running in each function block during the entire running process.

Observe at which stage the running overhead of each Goroutine is spent. as follows:

图片

From the above figure, we can see that there are 3 goroutines in total, namely:

  • runtime.main
  • runtime/trace.Start.func1
  • main.main.func1

What do they do? We can click on the specific items to observe. as follows:

图片

At the same time, you can also see the current proportion of Goroutine in the total call time, as well as some overheads of GC cleaning and GC pause waiting.

If you think it is not enough, you can download the chart and analyze it, which is equivalent to breaking the whole Goroutine runtime to see. This piece can help us to the Goroutine runtime by 16148155d3fb22, and we can know what is slow. , And then decide the next investigation direction .

as follows:

namemeaningtime consuming
Execution Timeexecution time3140ns
Network Wait TimeNetwork latency0ns
Sync Block TimeSynchronous blocking time0ns
Blocking Syscall TimeCall blocking time0ns
Scheduler Wait TimeScheduling wait time14ns
GC SweepingGC cleaning0ns
GC PauseGC pause0ns

View tracking

After we have a preliminary understanding of the current program's Goroutine running distribution, we will use "View Tracking" to see the correlation, as follows:

图片

A cursory look at this tracking diagram, I believe some friends will be more awkward, we can view it piece by piece according to the annotations, as follows:

  1. Timeline: Display the time unit of execution. The interval can be adjusted according to the time dimension. For details, you can execute shift + ? view the help manual.
  2. Heap: Shows the memory allocation and release during execution.
  3. Coroutine: Shows how many coroutines are running in each Goroutine running phase during execution, which includes three states: GC waiting (GCWaiting), runnable (Runnable), and running (Running).
  4. OS Thread: Shows how many threads are running during execution, including two states: Syscall (InSyscall) and Running.
  5. Virtual processor: Each virtual processor displays one line, and the number of virtual processors generally defaults to the number of system cores.
  6. Coroutines and events: Shows what Goroutine is running on each virtual processor, and the connection behavior represents event correlation.

图片

After clicking on the specific Goroutine behavior, you can see its related detailed information. This is very simple, and you can understand it after actual operation. The text is explained as follows:

  • Start: start time
  • Wall Duration: Duration
  • Self Time: execution time
  • Start Stack Trace: Stack information at the beginning
  • End Stack Trace: Stack information at the end
  • Incoming flow: input flow
  • Outgoing flow: output flow
  • Preceding events: previous events
  • Following events: Following events
  • All connected: all connected events

View event

We can view the event flow of our application in operation by clicking View Options-Flow events, Following events, etc. as follows:

图片

By analyzing the flow of events on the graph, we can know:

  • This program runs from G1 runtime.main .
  • 2 Goroutines are created at runtime:
  • First, create G18 runtime/trace.Start.func1 .
  • Then create G19 main.main.func1 .

At the same time, we can understand its call type through its Goroutine Name. as follows:

图片

  • runtime/trace.Start.func1 is a program in main.main called runtime/trace.Start method.
  • Then the method uses the coroutine to create a closure func1 to call.

Here we look at the code at the beginning, it is obviously the input and output process of ch

Actual combat drill

At three o'clock in the morning, suddenly there was a problem in the production environment. You have already buried the magical tool _ "net/http/pprof"

When you were woken up by the alarm phone, you stumbled and executed the following commands in a specific way:

$ curl http://127.0.0.1:6060/debug/pprof/trace\?seconds\=20 > trace.out
$ go tool trace trace.out

View tracking

You quickly saw the familiar List interface, and then opened the View trace interface, as follows:

图片

You who are completely W , hold steady, and execute the shortcut key 06148155d400c1 to the appropriate area to continuously zoom in the timeline, as follows:

图片

After preliminary investigation, you find that most of the above Gs are actually google.golang.org/grpc.(*Server).Serve.func , and a large Serve of related actions are also related actions triggered by 06148155d400e5.

图片

At this time, the experienced you already have a preliminary conclusion, you can continue to trace the View trace and go deep into it.

But it is recommended to take a bird's eye view of the whole picture first, so we will look down at the information provided by "Network blocking profile" and "Syscall blocking profile".

Network congestion overview

图片

System call blocking overview

图片

Through the tracking and analysis of the above three items, plus the leak, the time-consuming blocking, and the name of the internal method involved, it is obvious that who forgot to close the client connection again.

At this time, we can proceed to the next step of investigation and modification.

Summarize

Through this article, we have learned the go tool trace , which can track and capture various execution events, such as:

  • Goroutine creation/blocking/unblocking.
  • Syscall entry/exit/block, GC event.
  • The size of Heap changes.
  • Processor start/stop, etc.

I hope you can make good use of the pprof + trace combination of Go’s two major killers. This is a good partner, who knows who uses it, even if he is not absolutely omnipotent.

If you have any questions, welcome feedback and communication in the comment area. The best relationship between . Your is fried fish . Thank you for your support.

The article is continuously updated. You can read it through search [My brain is fried fish], and reply [16148155d4027e 000 ] I have prepared the first-line interview algorithm questions and information.
This article GitHub github.com/eddycjy/blog has been included, welcome Star to remind you.

煎鱼
8.4k 声望12.8k 粉丝