Hello everyone, I am fried fish.
Some time ago, I shared the " Go program crashed?" Fried fish teaches you to use PProf tool to fight fire! ", but sometimes using pprof alone may not be enough to fully observe and solve the problem, because the real program also contains many hidden actions, such as:
- What does Goroutine do when it is executed?
- How long has the Goroutine been executed/blocked?
- When was Syscall blocked? Where was it blocked?
- Who locked/unlocked Goroutine again?
- How does GC affect the execution of Goroutine?
These things are difficult to analyze with pprof, but if you want to know the answers above, you can use the protagonist go tool trace
in this chapter to open the door to a new world.
Let's happily start the road of fish-sucking together.
Initial understanding
import (
"os"
"runtime/trace"
)
func main() {
trace.Start(os.Stderr)
defer trace.Stop()
ch := make(chan string)
go func() {
ch <- "Go语言编程之旅"
}()
<-ch
}
Generate the trace file:
$ go run main.go 2> trace.out
Start the visual interface:
$ go tool trace trace.out
2019/06/22 16:14:52 Parsing trace...
2019/06/22 16:14:52 Splitting trace...
2019/06/22 16:14:52 Opening browser. Trace viewer is listening on http://127.0.0.1:57321
View the visual interface:
- View trace: View trace
- Goroutine analysis: Goroutine analysis
- Network blocking profile: network blocking profile
- Synchronization blocking profile: Synchronization blocking profile
- Syscall blocking profile: system call blocking profile
- Scheduler latency profile: Scheduler latency profile
- User defined tasks: user-defined tasks
- User defined regions: User defined regions
- Minimum mutator utilization: minimum Mutator utilization
Overview of scheduling delays
When you first look at the problem, you shouldn’t get caught up in the details at the beginning, unless it’s obvious.
Therefore, we generally check the "Scheduler latency profile" first, and we can see the overall call overhead through Graph, as follows:
The demo program is relatively simple, so here are two pieces, one is trace
itself, and the other is the transceiver channel
Goroutine analysis
The second step is to look at "Goroutine analysis". Through this function, we can see how many Goroutines are running in each function block during the entire running process.
Observe at which stage the running overhead of each Goroutine is spent. as follows:
From the above figure, we can see that there are 3 goroutines in total, namely:
runtime.main
。runtime/trace.Start.func1
。main.main.func1
。
What do they do? We can click on the specific items to observe. as follows:
At the same time, you can also see the current proportion of Goroutine in the total call time, as well as some overheads of GC cleaning and GC pause waiting.
If you think it is not enough, you can download the chart and analyze it, which is equivalent to breaking the whole Goroutine runtime to see. This piece can help us to the Goroutine runtime by 16148155d3fb22, and we can know what is slow. , And then decide the next investigation direction .
as follows:
name | meaning | time consuming |
---|---|---|
Execution Time | execution time | 3140ns |
Network Wait Time | Network latency | 0ns |
Sync Block Time | Synchronous blocking time | 0ns |
Blocking Syscall Time | Call blocking time | 0ns |
Scheduler Wait Time | Scheduling wait time | 14ns |
GC Sweeping | GC cleaning | 0ns |
GC Pause | GC pause | 0ns |
View tracking
After we have a preliminary understanding of the current program's Goroutine running distribution, we will use "View Tracking" to see the correlation, as follows:
A cursory look at this tracking diagram, I believe some friends will be more awkward, we can view it piece by piece according to the annotations, as follows:
- Timeline: Display the time unit of execution. The interval can be adjusted according to the time dimension. For details, you can execute
shift
+?
view the help manual. - Heap: Shows the memory allocation and release during execution.
- Coroutine: Shows how many coroutines are running in each Goroutine running phase during execution, which includes three states: GC waiting (GCWaiting), runnable (Runnable), and running (Running).
- OS Thread: Shows how many threads are running during execution, including two states: Syscall (InSyscall) and Running.
- Virtual processor: Each virtual processor displays one line, and the number of virtual processors generally defaults to the number of system cores.
- Coroutines and events: Shows what Goroutine is running on each virtual processor, and the connection behavior represents event correlation.
After clicking on the specific Goroutine behavior, you can see its related detailed information. This is very simple, and you can understand it after actual operation. The text is explained as follows:
- Start: start time
- Wall Duration: Duration
- Self Time: execution time
- Start Stack Trace: Stack information at the beginning
- End Stack Trace: Stack information at the end
- Incoming flow: input flow
- Outgoing flow: output flow
- Preceding events: previous events
- Following events: Following events
- All connected: all connected events
View event
We can view the event flow of our application in operation by clicking View Options-Flow events, Following events, etc. as follows:
By analyzing the flow of events on the graph, we can know:
- This program runs from
G1 runtime.main
. - 2 Goroutines are created at runtime:
- First, create
G18 runtime/trace.Start.func1
. - Then create
G19 main.main.func1
.
At the same time, we can understand its call type through its Goroutine Name. as follows:
runtime/trace.Start.func1
is a program inmain.main
calledruntime/trace.Start
method.- Then the method uses the coroutine to create a closure
func1
to call.
Here we look at the code at the beginning, it is obviously the input and output process of ch
Actual combat drill
At three o'clock in the morning, suddenly there was a problem in the production environment. You have already buried the magical tool _ "net/http/pprof"
When you were woken up by the alarm phone, you stumbled and executed the following commands in a specific way:
$ curl http://127.0.0.1:6060/debug/pprof/trace\?seconds\=20 > trace.out
$ go tool trace trace.out
View tracking
You quickly saw the familiar List interface, and then opened the View trace interface, as follows:
You who are completely W
, hold steady, and execute the shortcut key 06148155d400c1 to the appropriate area to continuously zoom in the timeline, as follows:
After preliminary investigation, you find that most of the above Gs are actually google.golang.org/grpc.(*Server).Serve.func
, and a large Serve
of related actions are also related actions triggered by 06148155d400e5.
At this time, the experienced you already have a preliminary conclusion, you can continue to trace the View trace and go deep into it.
But it is recommended to take a bird's eye view of the whole picture first, so we will look down at the information provided by "Network blocking profile" and "Syscall blocking profile".
Network congestion overview
System call blocking overview
Through the tracking and analysis of the above three items, plus the leak, the time-consuming blocking, and the name of the internal method involved, it is obvious that who forgot to close the client connection again.
At this time, we can proceed to the next step of investigation and modification.
Summarize
Through this article, we have learned the go tool trace
, which can track and capture various execution events, such as:
- Goroutine creation/blocking/unblocking.
- Syscall entry/exit/block, GC event.
- The size of Heap changes.
- Processor start/stop, etc.
I hope you can make good use of the pprof + trace combination of Go’s two major killers. This is a good partner, who knows who uses it, even if he is not absolutely omnipotent.
If you have any questions, welcome feedback and communication in the comment area. The best relationship between . Your is fried fish . Thank you for your support.
The article is continuously updated. You can read it through search [My brain is fried fish], and reply [16148155d4027e 000 ] I have prepared the first-line interview algorithm questions and information.
This article GitHub github.com/eddycjy/blog has been included, welcome Star to remind you.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。