2

  Go程序出异常怎么办?pprof工具分析啊,可是如果是代码方面bug等呢?分析代码bug有时需要结合执行过程,加日志呗,可是某些异常问题服务重启之后,可能会很难复现。这时候我们可以断点调试,这样就能分析每一行代码的执行,每一个变量的结果,C语言通常使用GDB调试,Go语言有专门的调试工具dlv,本篇文章主要介绍dlv的基本使用。

dlv 概述

  dlv全称delve,安装也比较简单,go install就能安装:

//下载&安装
$ git clone https://github.com/go-delve/delve
$ cd delve
$ go install github.com/go-delve/delve/cmd/dlv

//go 1.16版本以上
# Install at a specific version or pseudo-version:
$ go install github.com/go-delve/delve/cmd/dlv@v1.7.3

#On macOS make sure you also install the command line developer tools:
xcode-select --install

  dlv支持多种方式跟踪你的Go程序,help命令查看:

dlv help

//参数传递
Pass flags to the program you are debugging using `--`, for example:
`dlv exec ./hello -- server --config conf/config.toml`

Usage:
  dlv [command]

Available Commands:
  //常用来调试异常进程
  attach      Attach to running process and begin debugging.
  //启动并调试二进制程序
  exec        Execute a precompiled binary, and begin a debug session.
  debug       Compile and begin debugging main package in current directory, or the package specified.
  ......

  dlv与GDB还是比较类似的,可打印变量的值,可设置断点,可单步执行,可查看调用栈,另外还可以查看当前Go进程的所有协程、线程等;常用的功能(命令)如下:

Running the program:
    //运行到断点处,或者直到程序终止
    continue (alias: c) --------- Run until breakpoint or program termination.
    //单步执行
    next (alias: n) ------------- Step over to next source line.
    //重新启动进程
    restart (alias: r) ---------- Restart process.
    //进入函数,普通的n函数调用是一行代码,会直接跳过
    step (alias: s) ------------- Single step through program.
    //退出函数执行
    stepout (alias: so) --------- Step out of the current function.

Manipulating breakpoints:
    //设置断点
    break (alias: b) ------- Sets a breakpoint.
    //查看所有断点
    breakpoints (alias: bp)  Print out info for active breakpoints.
    //删除断点
    clear ------------------ Deletes breakpoint.
    //删除所有断点
    clearall --------------- Deletes multiple breakpoints.

Viewing program variables and memory:
    //输出函数参数
    args ----------------- Print function arguments.
    //输出局部变量
    locals --------------- Print local variables.
    //输出某一个变量
    print (alias: p) ----- Evaluate an expression.
    //输出寄存器内存
    regs ----------------- Print contents of CPU registers.
    //修改变量的值
    set ------------------ Changes the value of a variable.

Listing and switching between threads and goroutines:
    //输出协程调用栈或者切换到指定协程
    goroutine (alias: gr) -- Shows or changes current goroutine
    //输出所有协程
    goroutines (alias: grs)  List program goroutines.
    //切换到指定线程
    thread (alias: tr) ----- Switch to the specified thread.
    //输出所有线程
    threads ---------------- Print out info for every traced thread.

Viewing the call stack and selecting frames:
    //输出调用栈
    stack (alias: bt)  Print stack trace.

Other commands:
    //输出程序汇编指令
    disassemble (alias: disass)  Disassembler.
    //显示源代码
    list (alias: ls | l) ------- Show source code.

  dlv的命令虽然比较多,但是常用的也就几个,一般只要会设置断点,单步执行,输出变量、调用栈等就能满足基本的调试需求。

dlv 实战

  我们写一个小程序,通过dlv调试,复习下之前介绍的管道读写,以及调度器流程。注意,Go是多线程/多协程程序,实际执行过程可能比较复杂,而且笔者也省略了部分调试过程,所以即使你完全跟着步骤调试,结果可能也不一样。程序如下:

package main

import (
    "fmt"
    "time"
)

func main() {
    queue := make(chan int, 1)
    go func() {
        for {
            data := <- queue      
            fmt.Print(data, " ")  
        }
    }()

    for i := 0; i < 10; i ++ {
        queue <- i                
    }
    time.Sleep(time.Second * 1000)
}

  编译Go程序并通过dlv启动执行:

//编译标识注意 -N -l ,禁止编译优化
go build -gcflags '-N -l' test.go

dlv exec test
Type 'help' for list of commands.
(dlv)

  接下来就可以输入上面介绍的诸多调试命令,开启dlv调试之旅了。我们之前已经介绍过管道的实现原理以及Go调度器相关,管道的读写操作实现函数为runtime.chanrecv/runtime.chansend,调度器主逻辑是runtime.schedule;另外,读者需要知道,我们的主协程也就是main函数,编译后对应的函数是main.main。在这几个函数都添加断点。

//有些时候只根据函数名无法区分,设置断点可能需要携带包名,如runtime.chansend
(dlv) b chansend
Breakpoint 1 set at 0x1003f0a for runtime.chansend() /go1.18/src/runtime/chan.go:159
(dlv) b chanrecv
Breakpoint 2 set at 0x1004c2f for runtime.chanrecv() /go1.18/src/runtime/chan.go:455
(dlv) b schedule
Breakpoint 3 set at 0x1037aea for runtime.schedule() /go1.18/src/runtime/proc.go:3111
(dlv) b main.main
Breakpoint 4 set at 0x1089a0a for main.main() ./test.go:8

  continue(简写c)命令执行到断点处:

(dlv) c
> runtime.schedule() /go1.18/src/runtime/proc.go:3111 (hits total:1) (PC: 0x1037aea)

=>3111:    func schedule() {
  3112:        _g_ := getg()
  3113:
  3114:        if _g_.m.locks != 0 {
  3115:            throw("schedule: holding locks")
  3116:        }

  =>指向当前执行的代码,第一次竟然执行到了runtime.schedule,没有到main函数?要知道main函数最终也是作为主协程调度执行的,所以main函数肯定不是第一个执行的,调度主协程之前肯定需要线程,创建主协程,执行调度逻辑等等。那Go程序第一行代码应该是什么?我们看一下调用栈:

(dlv) bt
0  0x0000000001037aea in runtime.schedule
   at /go1.18/src/runtime/proc.go:3111
1  0x000000000103444d in runtime.mstart1
   at /go1.18/src/runtime/proc.go:1425
2  0x000000000103434c in runtime.mstart0
   at /go1.18/src/runtime/proc.go:1376
3  0x00000000010585e5 in runtime.mstart
   at /go1.18/src/runtime/asm_amd64.s:368
4  0x0000000001058571 in runtime.rt0_go
   at /go1.18/src/runtime/asm_amd64.s:331

  Go程序第一行代码在runtime/asm_amd64.s,入口函数是runtime.rt0_go,有兴趣的可以看看,都是汇编代码。接下来,继续c执行到断点,你会发现还是程序还是会执行的暂停到runtime.schedule,甚至是runtime.chanrecv,这是因为在调度主协程之前,还需要做很多初始化工作(有用到这几个函数)。所以我们通常是先设置断点main.main,c执行到这里,再设置其他断点,restart重新执行程序,删除其他断点,重新在main.main设置断点,并continue执行到断点处:

(dlv) r
Process restarted with PID 57676

(dlv) clearall

(dlv) b main.main
Breakpoint 5 set at 0x1089a0a for main.main() ./test.go:8

(dlv) c
> main.main() ./test.go:8 (hits goroutine(1):1 total:1) (PC: 0x1089a0a)

=>   8:    func main() {
     9:        queue := make(chan int, 1)
    10:        go func() {

  这下程序终于执行到main.main函数处了,接下来在管道读写函数设置断点,并continue执行到断点处:

(dlv) b chansend
Breakpoint 1 set at 0x1003f0a for runtime.chansend() /go1.18/src/runtime/chan.go:159
(dlv) b chanrecv
Breakpoint 2 set at 0x1004c2f for runtime.chanrecv() /go1.18/src/runtime/chan.go:455

(dlv) c
> runtime.chansend() /go1.18/src/runtime/chan.go:159 (hits goroutine(1):1 total:1) (PC: 0x1003f0a)

=> 159:    func chansend(c *hchan, ep unsafe.Pointer, block bool, callerpc uintptr) bool {
   160:        if c == nil {
   161:            if !block {
   162:                return false
   163:            }

  程序执行到了runtime.chansend函数,对应的应该是"queue <- i"这一行代码。bt看看函数栈桢确认下是不是:

(dlv) bt
0  0x0000000001003f0a in runtime.chansend
   at /go1.18/src/runtime/chan.go:159
1  0x0000000001003edd in runtime.chansend1
   at /go1.18/src/runtime/chan.go:144
2  0x0000000001089aa9 in main.main
   at ./test.go:18

//查看参数
(dlv) args
c = (*runtime.hchan)(0xc00005a070)
ep = unsafe.Pointer(0xc000070f58)
block = true    //会阻塞协程
callerpc = 17341097
~r0 = (unreadable empty OP stack)

//循环第一次写入管道的数值应该是0,x命令可查看内存
(dlv) x 0xc000070f58
0xc000070f58:   0x00

  这里我们通过args命令看一下输入参数,block为true说明会阻塞当前协程(如果管道不可写),ep是一个地址,存储待写入数据,x命令可以查看内存,我们看到就是数值0。

  还记得我们之前介绍的管道chan的实现原理吗?底层维护着一个循环队列(有缓冲管道),写数据主要包含这几步逻辑:1)如果管道为nil,阻塞当前协程(block=true);2)如果已关闭,抛出panic异常;3)如果有协程在等待读,直接将数据交给目标协程,并唤醒该协程;4)如果管道还有剩余容量,写数据;4)管道容量已经满了,阻塞当前协程(block=true)。

  接下来可以单步执行,看看管道写操作的执行流程。这一过程比较简单,重复较多,就不再赘述了,我们只列出来单步执行的一个中间过程:

(dlv) n
1 > runtime.chansend() /go1.18/src/runtime/chan.go:208 (PC: 0x10040e0)
Warning: debugging optimized function
   203:        if c.closed != 0 {
   204:            unlock(&c.lock)
   205:            panic(plainError("send on closed channel"))
   206:        }
   207:
=> 208:        if sg := c.recvq.dequeue(); sg != nil {
   209:            // Found a waiting receiver. We pass the value we want to send
   210:            // directly to the receiver, bypassing the channel buffer (if any).
   211:            send(c, sg, ep, func() { unlock(&c.lock) }, 3)
   212:            return true
   213:        }

  单步执行过程中,你可能会发现阻塞协程是通过gopark函数将协程换出,切换到调度器循环的。我们在runtime.schedule以及runtime.gopark函数再设置断点,观察协程切换情况:

(dlv) b schedule
Breakpoint 8 set at 0x1037aea for runtime.schedule() /go1.18/src/runtime/proc.go:3111
(dlv) b gopark
Breakpoint 9 set at 0x1031aca for runtime.gopark() /go1.18/src/runtime/proc.go:344

(dlv) c
> runtime.gopark() /go1.18/src/runtime/proc.go:344 (hits goroutine(1):2 total:2) (PC: 0x1031aca)

=> 344:    func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceEv byte, traceskip int) {
   345:        if reason != waitReasonSleep {
   346:            checkTimeouts() // timeouts may expire while two goroutines keep the scheduler busy
   347:        }
   348:        mp := acquirem()
   349:        gp := mp.curg

  runtime.gopark函数主要是切换到调度栈,并执行runtime.schedule调度器(查找可执行协程并调度),所以再次continue会执行到runtime.schedule断点处:

(dlv) c
> [b] runtime.schedule() /go1.18/src/runtime/proc.go:3111 (hits total:19) (PC: 0x1037aea)

=>3111:    func schedule() {
  3112:        _g_ := getg()


(dlv) bt
0  0x0000000001037aea in runtime.schedule
   at /Users/lile/Documents/go1.18/src/runtime/proc.go:3111
1  0x000000000103826d in runtime.park_m
   at /Users/lile/Documents/go1.18/src/runtime/proc.go:3336
2  0x0000000001058663 in runtime.mcall
   at /Users/lile/Documents/go1.18/src/runtime/asm_amd64.s:425

  bt查看调用栈,发现栈底函数是runtime.mcall,调用栈这么短吗?怎么看不到runtime.gopark函数呢?因为这里切换了栈桢,从用户协程栈切换到调度栈,所以调用链路肯定不一样了,是看不到之前用户栈的调用链路的。runtime.mcall函数就是用来切换栈桢的。

总结

  dlv是Go程序调试非常好的工具,不仅可以帮助我们学习理解Go语言,也可以帮助我们快速排查定位程序bug等,一定要熟练掌握。


李烁
156 声望92 粉丝