有用的 SystemTap 脚本

注：该文原文是 Chapter 5. Useful SystemTap Scripts

注：还未完成，先丢上来纯粹是为了测试新功能目录结构滴。这个备注在文章完成后，会删除滴。

本章列举了几种可以用来监测和调查不同的子系统的 SystemTap 脚本。一旦你安装了 systemtap-testsuite RPM 包，所有的这些脚本都可以在 /usr/share/systemtap/testsuite/systemtap.examples/ 目录下找到。

5.1 网络

后面的章节展示了跟踪网络相关的函数和构建一个网络活动的概要文件的脚本。

5.1.1 网络性能分析

本节描述了如何描述网络活动，nettop.stp 提供了一个了解在每台机器上每个进程生成了多少网络流量的机会。

nettop.stp

#! /usr/bin/env stap

global ifxmit, ifrecv
global ifmerged

probe netdev.transmit
{
  ifxmit[pid(), dev_name, execname(), uid()] <<< length
}

probe netdev.receive
{
  ifrecv[pid(), dev_name, execname(), uid()] <<< length
}

function print_activity()
{
  printf("%5s %5s %-7s %7s %7s %7s %7s %-15s\n",
         "PID", "UID", "DEV", "XMIT_PK", "RECV_PK",
         "XMIT_KB", "RECV_KB", "COMMAND")

  foreach ([pid, dev, exec, uid] in ifrecv) {
      ifmerged[pid, dev, exec, uid] += @count(ifrecv[pid,dev,exec,uid]);
  }
  foreach ([pid, dev, exec, uid] in ifxmit) {
      ifmerged[pid, dev, exec, uid] += @count(ifxmit[pid,dev,exec,uid]);
  }
  foreach ([pid, dev, exec, uid] in ifmerged-) {
    n_xmit = @count(ifxmit[pid, dev, exec, uid])
    n_recv = @count(ifrecv[pid, dev, exec, uid])
    printf("%5d %5d %-7s %7d %7d %7d %7d %-15s\n",
           pid, uid, dev, n_xmit, n_recv,
           n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0,
           n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0,
           exec)
  }

  print("\n")

  delete ifxmit
  delete ifrecv
  delete ifmerged
}

probe timer.ms(5000), end, error
{
  print_activity()
}

注意 function print_activity() 使用以下表达式：

n_xmit ? @sum(ifxmit[pid, dev, exec, uid])/1024 : 0
n_recv ? @sum(ifrecv[pid, dev, exec, uid])/1024 : 0

这些表达式是 if/else 条件判断语句，上面第二个语句是以下伪代码的一个更简洁的写作方式：

if n_recv != 0 then
  @sum(ifrecv[pid, dev, exec, uid])/1024
else
  0

nettop.stp 跟踪在系统上哪个进程在生成网络流量，并提供关于进程的以下信息：

PID — the ID of the listed process.
UID — user ID. A user ID of 0 refers to the root user.
DEV — which ethernet device the process used to send / receive data (for example, eth0, eth1)
XMIT_PK — number of packets transmitted by the process
RECV_PK — number of packets received by the process
XMIT_KB — amount of data sent by the process, in kilobytes
RECV_KB — amount of data received by the service, in kilobytes

nettop.stp 每 5 秒提供网络性能分析取样。你可以根据 probe timer.ms(5000) 改变这个设置， Example 5.1, “nettop.stp Sample Output” 包含了一份从 nettop.stp 输出的 20s 内的摘录。

Example 5.1. nettop.stp Sample Output

[...]
  PID   UID DEV     XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND
    0     0 eth0          0       5       0       0 swapper
11178     0 eth0          2       0       0       0 synergyc

  PID   UID DEV     XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND
 2886     4 eth0         79       0       5       0 cups-polld
11362     0 eth0          0      61       0       5 firefox
    0     0 eth0          3      32       0       3 swapper
 2886     4 lo            4       4       0       0 cups-polld
11178     0 eth0          3       0       0       0 synergyc

  PID   UID DEV     XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND
    0     0 eth0          0       6       0       0 swapper
 2886     4 lo            2       2       0       0 cups-polld
11178     0 eth0          3       0       0       0 synergyc
 3611     0 eth0          0       1       0       0 Xorg

  PID   UID DEV     XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND
    0     0 eth0          3      42       0       2 swapper
11178     0 eth0         43       1       3       0 synergyc
11362     0 eth0          0       7       0       0 firefox
 3897     0 eth0          0       1       0       0 multiload-apple
[...]

5.1.2 在网络 socket 代码中跟踪函数调用

本节描述了怎样从 net/socket.c 文件中跟踪函数调用。这个任务可以帮助你在更多的细节识别，在内核中，每个进程是怎么与网络交互的。

socket-trace.stp

#! /usr/bin/env stap

probe kernel.function("*@net/socket.c").call {
  printf ("%s -> %s\n", thread_indent(1), ppfunc())
}
probe kernel.function("*@net/socket.c").return {
  printf ("%s <- %s\n", thread_indent(-1), ppfunc())
}

socket-trace.stp 是完全和 Example 3.6, “thread_indent.stp” 一样的。最早在 SystemTap Functions 中使用用于证明 thread_indent() 是怎么工作的。

Example 5.2. socket-trace.stp Sample Output

[...]
0 Xorg(3611): -> sock_poll
3 Xorg(3611): <- sock_poll
0 Xorg(3611): -> sock_poll
3 Xorg(3611): <- sock_poll
0 gnome-terminal(11106): -> sock_poll
5 gnome-terminal(11106): <- sock_poll
0 scim-bridge(3883): -> sock_poll
3 scim-bridge(3883): <- sock_poll
0 scim-bridge(3883): -> sys_socketcall
4 scim-bridge(3883):  -> sys_recv
8 scim-bridge(3883):   -> sys_recvfrom
12 scim-bridge(3883):-> sock_from_file
16 scim-bridge(3883):<- sock_from_file
20 scim-bridge(3883):-> sock_recvmsg
24 scim-bridge(3883):<- sock_recvmsg
28 scim-bridge(3883):   <- sys_recvfrom
31 scim-bridge(3883):  <- sys_recv
35 scim-bridge(3883): <- sys_socketcall
[...]

Example 5.2, “socket-trace.stp Sample Output” 包含了 socket-trace.stp 输出中的 3s 引用。想要脚本 thread_indent() 提供的更多信息，请移步至 SystemTap Functions Example 3.6, “thread_indent.stp”。

5.1.3 监控传入的 TCP 连接

本节说明如何监控传入的TCP连接。这个任务在识别任何未授权的，可疑的，或是不必要的实时网络访问请求方面十分有用。

tcp_connections.stp

#! /usr/bin/env stap

probe begin {
  printf("%6s %16s %6s %6s %16s\n",
         "UID", "CMD", "PID", "PORT", "IP_SOURCE")
}

probe kernel.function("tcp_accept").return?,
      kernel.function("inet_csk_accept").return? {
  sock = $return
  if (sock != 0)
    printf("%6d %16s %6d %6d %16s\n", uid(), execname(), pid(),
           inet_get_local_port(sock), inet_get_ip_source(sock))
}

当 tcp_connections.stp 正在运行，它将打印任何关于被系统实时接收的 TCP 连接的以下信息：

Current UID
CMD - the command accepting the connection
PID of the command
Port used by the connection
IP address from which the TCP connection originated

Example 5.3. tcp_connections.stp Sample Output

UID            CMD    PID   PORT        IP_SOURCE
0             sshd   3165     22      10.64.0.227
0             sshd   3165     22      10.64.0.227

5.1.4 监控 TCP 包

本节说明了如何监控被系统接收的 TCP 包。这个对分析在系统上运行的应用生成的网络流量非常有用。

tcpdumplike.stp

#! /usr/bin/env stap

// A TCP dump like example

probe begin, timer.s(1) {
  printf("-----------------------------------------------------------------\n")
  printf("       Source IP         Dest IP  SPort  DPort  U  A  P  R  S  F \n")
  printf("-----------------------------------------------------------------\n")
}

probe udp.recvmsg /* ,udp.sendmsg */ {
  printf(" %15s %15s  %5d  %5d  UDP\n",
         saddr, daddr, sport, dport)
}

probe tcp.receive {
  printf(" %15s %15s  %5d  %5d  %d  %d  %d  %d  %d  %d\n",
         saddr, daddr, sport, dport, urg, ack, psh, rst, syn, fin)
}

当 tcpdumplike.stp 在运行，它将打印以下关于任何被实时接收的 TCP 包的信息：

Source and destination IP address (saddr, daddr, respectively)
Source and destination ports (sport, dport, respectively)
Packet flags

为了确定被包使用的标志，tcpdumplike.stp 使用了以下函数：

urg - urgent
ack - acknowledgement
psh - push
rst - reset
syn - synchronize
fin - finished

上述函数返回 1 或 0 来指定包是否使用了匹配的标志。

Example 5.4. tcpdumplike.stp Sample Output

-----------------------------------------------------------------
       Source IP         Dest IP  SPort  DPort  U  A  P  R  S  F
-----------------------------------------------------------------
  209.85.229.147       10.0.2.15     80  20373  0  1  1  0  0  0
  92.122.126.240       10.0.2.15     80  53214  0  1  0  0  1  0
  92.122.126.240       10.0.2.15     80  53214  0  1  0  0  0  0
  209.85.229.118       10.0.2.15     80  63433  0  1  0  0  1  0
  209.85.229.118       10.0.2.15     80  63433  0  1  0  0  0  0
  209.85.229.147       10.0.2.15     80  21141  0  1  1  0  0  0
  209.85.229.147       10.0.2.15     80  21141  0  1  1  0  0  0
  209.85.229.147       10.0.2.15     80  21141  0  1  1  0  0  0
  209.85.229.147       10.0.2.15     80  21141  0  1  1  0  0  0
  209.85.229.147       10.0.2.15     80  21141  0  1  1  0  0  0
  209.85.229.118       10.0.2.15     80  63433  0  1  1  0  0  0
[...]

5.1.5 监控内核中的网络丢包

在 Linux 网络栈可以因为各种原因丢弃数据包。一些 Linux 内核包含了跟踪点，kernel.trace("kfree_skb")，可以很容易的跟踪包在哪里丢弃了。 dropwatch.stp 使用 kernel.trace("kfree_skb") 来追踪包丢弃；这个脚本概述了每 5 秒的间隔包丢弃的位置。

dropwatch.stp

#! /usr/bin/env stap

############################################################
# Dropwatch.stp
# Author: Neil Horman <nhorman@redhat.com>
# An example script to mimic the behavior of the dropwatch utility
# http://fedorahosted.org/dropwatch
############################################################

# Array to hold the list of drop points we find
global locations

# Note when we turn the monitor on and off
probe begin { printf("Monitoring for dropped packets\n") }
probe end { printf("Stopping dropped packet monitor\n") }

# increment a drop counter for every location we drop at
probe kernel.trace("kfree_skb") { locations[$location] <<< 1 }

# Every 5 seconds report our drop locations
probe timer.sec(5)
{
  printf("\n")
  foreach (l in locations-) {
    printf("%d packets dropped at %s\n",
           @count(locations[l]), symname(l))
  }
  delete locations
}

kernel.trace("kfree_skb") 跟踪到内核丢弃网络包的位置。kernel.trace("kfree_skb") 有两个参数：一个指向缓冲区的指针被释放（$skb）的 buffer，内核代码缓冲区的位置被释放（$location）。dropwatch.stp 脚本提供了包含 $location 的函数。把 $location 映射回函数的信息不是测量的默认值。在 SystemTap 1.4 ，--all-modules 选项将包含要求的映射信息，以下命令可以被用于运行这个脚本。

stap --all-modules dropwatch.stp

在 SystemTap 的老版本，你可以使用以下命令来模仿 --all-modules 选项：

stap -dkernel \
`cat /proc/modules | awk 'BEGIN { ORS = " " } {print "-d"$1}'` \
dropwatch.stp

运行 dropwatch.stp 脚本 15s 将有类似 Example 5.5, “dropwatch.stp Sample Output” 的输出结果。

Example 5.5. dropwatch.stp Sample Output

Monitoring for dropped packets

1762 packets dropped at unix_stream_recvmsg
4 packets dropped at tun_do_read
2 packets dropped at nf_hook_slow

467 packets dropped at unix_stream_recvmsg
20 packets dropped at nf_hook_slow
6 packets dropped at tun_do_read

446 packets dropped at unix_stream_recvmsg
4 packets dropped at tun_do_read
4 packets dropped at nf_hook_slow
Stopping dropped packet monitor

当脚本在一台机器上编译，在另外一台机器上运行， --all-modules 和 /proc/modules 目录是不可用的。symname 函数将打印出原始地址。为了使得原始地址丢弃的更有意义，涉及 /boot/System.map-uname -r`` 文件。文件列表列出了每个函数的开始地址。允许你映射地址到 Example 5.5, “dropwatch.stp Sample Output” 输出的一个指定的函数名字。得到 /boot/System.map-uname -r 文件的以下片段。 0xffffffff8149a8ed 地址映射到函数 unix_stream_recvmsg:

[...]
ffffffff8149a420 t unix_dgram_poll
ffffffff8149a5e0 t unix_stream_recvmsg
ffffffff8149ad00 t unix_find_other
[...]

5.2 磁盘

后面的章节展示了监控磁盘和 I/O 活动的脚本。

5.2.1 统计磁盘读写流量

这节描述了怎样识别哪个进程在执行频繁的磁盘 reads/writes。

disktop.stp

#!/usr/bin/env stap 
#
# Copyright (C) 2007 Oracle Corp.
#
# Get the status of reading/writing disk every 5 seconds,
# output top ten entries 
#
# This is free software,GNU General Public License (GPL);
# either version 2, or (at your option) any later version.
#
# Usage:
#  ./disktop.stp
#

global io_stat,device
global read_bytes,write_bytes

probe vfs.read.return {
  if ($return>0) {
    if (devname!="N/A") {/*skip read from cache*/
      io_stat[pid(),execname(),uid(),ppid(),"R"] += $return
      device[pid(),execname(),uid(),ppid(),"R"] = devname
      read_bytes += $return
    }
  }
}

probe vfs.write.return {
  if ($return>0) {
    if (devname!="N/A") { /*skip update cache*/
      io_stat[pid(),execname(),uid(),ppid(),"W"] += $return
      device[pid(),execname(),uid(),ppid(),"W"] = devname
      write_bytes += $return
    }
  }
}

probe timer.ms(5000) {
  /* skip non-read/write disk */
  if (read_bytes+write_bytes) {

    printf("\n%-25s, %-8s%4dKb/sec, %-7s%6dKb, %-7s%6dKb\n\n",
           ctime(gettimeofday_s()),
           "Average:", ((read_bytes+write_bytes)/1024)/5,
           "Read:",read_bytes/1024,
           "Write:",write_bytes/1024)

    /* print header */
    printf("%8s %8s %8s %25s %8s %4s %12s\n",
           "UID","PID","PPID","CMD","DEVICE","T","BYTES")
  }
  /* print top ten I/O */
  foreach ([process,cmd,userid,parent,action] in io_stat- limit 10)
    printf("%8d %8d %8d %25s %8s %4s %12d\n",
           userid,process,parent,cmd,
           device[process,cmd,userid,parent,action],
           action,io_stat[process,cmd,userid,parent,action])

  /* clear data */
  delete io_stat
  delete device
  read_bytes = 0
  write_bytes = 0  
}

probe end{
  delete io_stat
  delete device
  delete read_bytes
  delete write_bytes
}

disktop.stp 输出了最频繁读写磁盘的前 10 进程。Example 5.6, “disktop.stp Sample Output”显示了这个脚本的取样输出，每个列出的进程包含以下数据：

UID — user ID. A user ID of 0 refers to the root user.
PID — the ID of the listed process.
PPID — the process ID of the listed process's parent process.
CMD — the name of the listed process.
DEVICE — which storage device the listed process is reading from or writing to.
T — the type of action performed by the listed process; W refers to write, while R refers to read.
BYTES — the amount of data read to or written from disk.

disktop.stp 输出的时间和日期是由函数 ctime() 和 gettimeofday_s(). ctime() 返回的。硬件时钟从 UNIX 时间（January 1, 1970）以秒为单位传递。 gettimeofday_s() 计算了从 UNIX 时间的实际秒数。给出了一个相当准确的人类可读的时间戳作为输出。

在这个脚本中，$return 是一个本地变量，存储了每个进程从虚拟文件系统读或写的实际字节数。$return 仅能被用于返回探针（例如， vfs.read.return ）。

Example 5.6. disktop.stp Sample Output

[...]
Mon Sep 29 03:38:28 2008 , Average:  19Kb/sec, Read: 7Kb, Write: 89Kb

UID      PID     PPID                       CMD   DEVICE    T    BYTES
0    26319    26294                   firefox     sda5    W        90229
0     2758     2757           pam_timestamp_c     sda5    R         8064
0     2885        1                     cupsd     sda5    W         1678

Mon Sep 29 03:38:38 2008 , Average:   1Kb/sec, Read: 7Kb, Write: 1Kb

UID      PID     PPID                       CMD   DEVICE    T    BYTES
0     2758     2757           pam_timestamp_c     sda5    R         8064
0     2885        1                     cupsd     sda5    W         1678

5.2.2 为每个文件的读或写跟踪 I/O 时间

这节描述了每个进程读或写任何文件所花费的时间。这对确定哪个文件在系统中加载慢是非常有用的。

iotime.stp

#! /usr/bin/env stap

/*
 * Copyright (C) 2006-2007 Red Hat Inc.
 * 
 * This copyrighted material is made available to anyone wishing to use,
 * modify, copy, or redistribute it subject to the terms and conditions
 * of the GNU General Public License v.2.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 *
 * Print out the amount of time spent in the read and write systemcall
 * when each file opened by the process is closed. Note that the systemtap 
 * script needs to be running before the open operations occur for
 * the script to record data.
 *
 * This script could be used to to find out which files are slow to load
 * on a machine. e.g.
 *
 * stap iotime.stp -c 'firefox'
 *
 * Output format is:
 * timestamp pid (executabable) info_type path ...
 *
 * 200283135 2573 (cupsd) access /etc/printcap read: 0 write: 7063
 * 200283143 2573 (cupsd) iotime /etc/printcap time: 69
 *
 */

global start
global time_io

function timestamp:long() { return gettimeofday_us() - start }

function proc:string() { return sprintf("%d (%s)", pid(), execname()) }

probe begin { start = gettimeofday_us() }

global filehandles, fileread, filewrite

probe syscall.open.return {
  filename = user_string($filename)
  if ($return != -1) {
    filehandles[pid(), $return] = filename
  } else {
    printf("%d %s access %s fail\n", timestamp(), proc(), filename)
  }
}

probe syscall.read.return {
  p = pid()
  fd = $fd
  bytes = $return
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    fileread[p, fd] += bytes
  time_io[p, fd] <<< time
}

probe syscall.write.return {
  p = pid()
  fd = $fd
  bytes = $return
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    filewrite[p, fd] += bytes
  time_io[p, fd] <<< time
}

probe syscall.close {
  if ([pid(), $fd] in filehandles) {
    printf("%d %s access %s read: %d write: %d\n",
           timestamp(), proc(), filehandles[pid(), $fd],
           fileread[pid(), $fd], filewrite[pid(), $fd])
    if (@count(time_io[pid(), $fd]))
      printf("%d %s iotime %s time: %d\n",  timestamp(), proc(),
             filehandles[pid(), $fd], @sum(time_io[pid(), $fd]))
   }
  delete fileread[pid(), $fd]
  delete filewrite[pid(), $fd]
  delete filehandles[pid(), $fd]
  delete time_io[pid(),$fd]
}

iotime.stp 追踪系统调用打开, 关闭, 读, 和写一个文件的时间。对于每个系统调用访问，iotime.stp 会计算任何读写花费的微秒数和追踪读写进文件中的数据量。

iotime.stp 也使用本地变量 $count 来追踪任何系统调用试图读和写的数据量。注意 $return（被用于 Section 5.2.1, “Summarizing Disk Read/Write Traffic” 的 disktop.stp ）存储读写的实际数据量。 $count 仅能被用于追踪数据读写的探针上（是 syscall.read 和 syscall.write）。

Example 5.7. iotime.stp Sample Output

[...]
825946 3364 (NetworkManager) access /sys/class/net/eth0/carrier read: 8190 write: 0
825955 3364 (NetworkManager) iotime /sys/class/net/eth0/carrier time: 9
[...]
117061 2460 (pcscd) access /dev/bus/usb/003/001 read: 43 write: 0
117065 2460 (pcscd) iotime /dev/bus/usb/003/001 time: 7
[...]
3973737 2886 (sendmail) access /proc/loadavg read: 4096 write: 0
3973744 2886 (sendmail) iotime /proc/loadavg time: 11
[...]

Example 5.7, “iotime.stp Sample Output” 打印以下数据：

时间戳，以微秒为单位。
进程 ID 和进程名字。
一个 access 或 iotime 标志。
被访问的文件。

如果一个进程可以读写任何数据，一对 access 和 iotime 应该出现在一起， access 行的时间戳涉及到一个给定的进程访问文件的时间；在这行的最后，它将显示读写字节数。iotime 行显示了一个进程为了执行读写所花费的时间。

如果 access 行后跟随的不是任何 iotime 行，意味着该进程没有读写任何数据。

5.2.3 跟踪累积 I/O

这节描述了怎样跟踪累积的系统 I/O。

traceio.stp

#! /usr/bin/env stap
# traceio.stp
# Copyright (C) 2007 Red Hat, Inc., Eugene Teo <eteo@redhat.com>
# Copyright (C) 2009 Kai Meyer <kai@unixlords.com>
#   Fixed a bug that allows this to run longer
#   And added the humanreadable function
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
#

global reads, writes, total_io

probe vfs.read.return {
  if ($return > 0) {
    reads[pid(),execname()] += $return
    total_io[pid(),execname()] += $return
  }
}

probe vfs.write.return {
  if ($return > 0) {
    writes[pid(),execname()] += $return
    total_io[pid(),execname()] += $return
  }
}

function humanreadable(bytes) {
  if (bytes > 1024*1024*1024) {
    return sprintf("%d GiB", bytes/1024/1024/1024)
  } else if (bytes > 1024*1024) {
    return sprintf("%d MiB", bytes/1024/1024)
  } else if (bytes > 1024) {
    return sprintf("%d KiB", bytes/1024)
  } else {
    return sprintf("%d   B", bytes)
  }
}

probe timer.s(1) {
  foreach([p,e] in total_io- limit 10)
    printf("%8d %15s r: %12s w: %12s\n",
           p, e, humanreadable(reads[p,e]),
           humanreadable(writes[p,e]))
  printf("\n")
  # Note we don't zero out reads, writes and total_io,
  # so the values are cumulative since the script started.
}

traceio.stp 打印了前十的可执行文件生成 I/O 通信。此外，它也跟踪 I/O 读写的累积数量，通过这些前十的可执行文件。这些信息会被追踪并每隔 1s 打印出来，以降序的方式。

注意 traceio.stp 也使用本地变量 $return，被 Section 5.2.1, “Summarizing Disk Read/Write Traffic” 章节的 disktop.stp 使用的。

Example 5.8. traceio.stp Sample Output

[...]
           Xorg r:   583401 KiB w:        0 KiB
       floaters r:       96 KiB w:     7130 KiB
multiload-apple r:      538 KiB w:      537 KiB
           sshd r:       71 KiB w:       72 KiB
pam_timestamp_c r:      138 KiB w:        0 KiB
        staprun r:       51 KiB w:       51 KiB
          snmpd r:       46 KiB w:        0 KiB
          pcscd r:       28 KiB w:        0 KiB
     irqbalance r:       27 KiB w:        4 KiB
          cupsd r:        4 KiB w:       18 KiB

           Xorg r:   588140 KiB w:        0 KiB
       floaters r:       97 KiB w:     7143 KiB
multiload-apple r:      543 KiB w:      542 KiB
           sshd r:       72 KiB w:       72 KiB
pam_timestamp_c r:      138 KiB w:        0 KiB
        staprun r:       51 KiB w:       51 KiB
          snmpd r:       46 KiB w:        0 KiB
          pcscd r:       28 KiB w:        0 KiB
     irqbalance r:       27 KiB w:        4 KiB
          cupsd r:        4 KiB w:       18 KiB

5.2.4 I/O 监控 (By Device)

这节描述了怎样在指定设备上监控 I/O 活动。

traceio2.stp

#! /usr/bin/env stap

global device_of_interest

probe begin {
  /* The following is not the most efficient way to do this.
      One could directly put the result of usrdev2kerndev()
      into device_of_interest.  However, want to test out
      the other device functions */
  dev = usrdev2kerndev($1)
  device_of_interest = MKDEV(MAJOR(dev), MINOR(dev))
}

probe vfs.write, vfs.read
{
  if (dev == device_of_interest)
    printf ("%s(%d) %s 0x%x\n",
            execname(), pid(), ppfunc(), dev)
}

traceio2.stp 需要一个参数：整个设备号。为了获取这个数字，使用 stat -c "0x%D" directory，directory 位于被监控的设备。

usrdev2kerndev() 函数把整个设备号转换成内核可理解的格式。usrdev2kerndev() 产生的输出被用于连接 MKDEV()， MINOR()，和 MAJOR() 函数来确定指定设备的最大和最小的数字。

traceio2.stp 输出包含任何执行读写进程的 ID 和名字，执行的函数（vfs_read 或 vfs_write），和内核设备号。

以下示例是从 stap traceio2.stp 0x805 的完整输出摘录的，0x805 是 /home 的整个设备号，/home 在 /dev/sda5 中，就是我们希望监控的设备。

Example 5.9. traceio2.stp Sample Output

[...]
synergyc(3722) vfs_read 0x800005
synergyc(3722) vfs_read 0x800005
cupsd(2889) vfs_write 0x800005
cupsd(2889) vfs_write 0x800005
cupsd(2889) vfs_write 0x800005
[...]

5.2.5 监控到一个文件的读和写

这节描述了怎样监控文件的实时读写。

inodewatch.stp

#! /usr/bin/env stap

probe vfs.write, vfs.read
{
  # dev and ino are defined by vfs.write and vfs.read
  if (dev == MKDEV($1,$2) # major/minor device
      && ino == $3)
    printf ("%s(%d) %s 0x%x/%u\n",
      execname(), pid(), ppfunc(), dev, ino)
}

有用的 SystemTap 脚本

5.1 网络

5.1.1 网络性能分析

5.1.2 在网络 socket 代码中跟踪函数调用

5.1.3 监控传入的 TCP 连接

5.1.4 监控 TCP 包

5.1.5 监控内核中的网络丢包

5.2 磁盘

5.2.1 统计磁盘读写流量

5.2.2 为每个文件的读或写跟踪 I/O 时间

5.2.3 跟踪累积 I/O

5.2.4 I/O 监控 (By Device)

5.2.5 监控到一个文件的读和写

5.2.6 监控文件属性的改变

5.2.7 定期打印 I/O 阻塞时间

5.3 优化

5.3.1 计算函数调用

5.3.2 调用图跟踪

5.3.3 确定在内核空间和用户空间花费的时间

5.3.4 监控轮询程序

5.3.5 跟踪最频繁的系统调用

5.3.6 跟踪每个进程的系统调用卷

5.4 识别用户空间竞争锁

yexiaobai

引用和评论

K8S client-go Patch example

百万架构师第二十五课：分布式架构的基础：分布式系统的基石TCP-IP通讯协议｜JavaGuide

阿里云个人博客外网访问中断应急指南：从安全组到日志的七步排查法

技术分享 | MySQL内存使用率高问题排查

什么是云互联网

🧩x tping (1) - 无需安装 tcping，轻松实现 TCP 端口 ping 测试与图形化展示

IPv6 支持度检测有意义吗？