background
As we all know, eBPF is a very promising project, and even established a special foundation ( https://ebpf.io/) to promote the development and standardization of its ecology.
There are a lot of information about the basic knowledge of eBPF, so I won't repeat it here.
This article aims to explore what chemical reactions will occur when eBPF and kubernetes are combined, and how to combine existing tool chains to solve practical problems.
The related open source projects involved are mainly as follows:
- bcc
- bpftrace
- kubectl-trace
- kubectl-flame
- cilium
precondition
kernel
The concept of eBPF has been around for a long time, so in fact, some functions can be supported under the old version of the kernel, as shown in the following table. Taking the x86_64 system as an example, JIT compilation is supported in version 3.16.
But if you want to experience/use most of the functions normally, it is recommended to upgrade to the latest LTS version of the kernel. For example, the currently used CentOS 7.9 after upgrading the kernel version used is 5.15.4.
The following link shows the version of most of the functions that eBPF depends on.
https://github.com/iovisor/bcc/blob/master/docs/kernel-versions.md
kernel header
Some functions depend on the header files in the kernel header, and all the kernel headers corresponding to the kernel version need to be installed.
Introduction to related projects
bcc
BCC makes BPF programs easier to write, with kernel instrumentation in C (and includes a C wrapper around LLVM),
and front-ends in Python and lua. It is suited for many tasks, including performance analysis and network traffic control.
Provides a set of easy-to-use programming interfaces, allowing developers to write eBPF-based functional scripts with python or lua without detailed kernel code (nearly ten million lines of code).
bpftrace
bpftrace is a high-level tracing language for Linux enhanced Berkeley Packet Filter (eBPF) available in recent Linux kernels (4.x).
bpftrace uses LLVM as a backend to compile scripts to BPF-bytecode and makes use of BCC for interacting with the Linux BPF system,
as well as existing Linux tracing capabilities: kernel dynamic tracing (kprobes), user-level dynamic tracing (uprobes), and tracepoints.
bpftrace is built on top of bcc and has implemented a set of DSL based on C/awk, which is more suitable for some simple one-liner commands to monitor or trace, or directly use official examples.
kubectl-trace
eBPF's kubectl plug-in can use bpftrace to monitor k8s resources such as node/pod. The latest version v0.1.2 (2021.7) only supports bpftrace, and future versions will support both bpftrace and bcc (the code has been merged into the main branch but has not yet been released).
The overall experience is smoother.
kubectl-flame
Yahoo open sourced the kubectl plug-in that provides flame graphs for programs. Officially, it is possible to attach to the business container for analysis without changing the business program.
After the test, the community version has many bugs and cannot run smoothly. In fact, it is dependent on business containers, including the JDK. The experience is average, and it feels difficult to implement it on a large scale.
clilium
The kubernetes network plug-in that uses eBPF technology to improve network forwarding performance and enhance observability.
Get Your Hands Dirty
The CentOS 7 kernel version used in the following experiments is 4.4.
Update kernel
yum -y update
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
yum --disablerepo="*" --enablerepo="elrepo-kernel" list available
Will show two types of kernel, lt and ml. Among them, lt stands for LongTerm, which is similar to Ubuntu's LTS; ml stands for MainLine. You can choose any one, but you need to correspond to it when you install the header. Choose ml here.
yum --enablerepo=elrepo-kernel install -y kernel-ml
Install header
You need to delete old headers before installation. If you have installed the header before, you may get an incompatibility error if you install it again.
yum remove kernel-headers
Install the new kernel
yum --enablerepo=elrepo-kernel install -y kernel-ml-headers
Use new kernel
View all currently available cores
$ sudo awk -F\' '$1=="menuentry " {print i++ " : " $2}' /etc/grub2.cfg
0 : CentOS Linux (4.18.7-1.el7.elrepo.x86_64) 7 (Core)
1 : CentOS Linux (3.10.0-862.11.6.el7.x86_64) 7 (Core)
2 : CentOS Linux (3.10.0-514.el7.x86_64) 7 (Core)
3 : CentOS Linux (0-rescue-063ec330caa04d4baae54c6902c62e54) 7 (Core)
Edit grub to modify the default kernel, restart the machine.
grub2-set-default 0
reboot
Install bcc/bpftrace
# bcc
yum install bcc-tools
# bpftrace
curl https://repos.baslab.org/rhel/7/bpftrace-daily/bpftrace-daily.repo --output /etc/yum.repos.d/bpftools.repo
curl https://repos.baslab.org/rhel/7/bpftools/bpftools.repo --output /etc/yum.repos.d/bpftrace-daily.repo
yum install bpftrace bpftrace-tools bpftrace-doc
Install kubectl-trace
Unlike bcc btftrace, which needs to be executed on each host, kubectl-trace is a client plug-in, which can be installed on the client machine where kubectl is executed.
kubectl krew install trace
Use bcc
git clone https://github.com/iovisor/bcc.git
cd ./bcc/tools/
# 查看某 java 进程的 gc 事件
./javagc.sh -l java 24682
Use bpftrace
git clone https://github.com/iovisor/bpftrace.git
cd ./bpftrace/tools/
# 查看 DNS 解析请求的延迟
bpftrace gethostlatency.bt
path referenced in the official tools is hardcoded, and an error may be reported (1619f06f7e25d2 https://github.com/iovisor/bpftrace/issues/2075#issuecomment-977648027), which can be requested Change to the correct path. (This problem should be fixed later)
kubectl trace node
Regardless of whether it is a node or a pod, trace starts a job on the corresponding node, and then detects the host or attaches to the pod container.
There is a bug here, which corresponds to the environment of CentOS. Even if I have installed kernel-headers on the host, I still need --fetch-headers to execute successfully.
k trace run node1 -e "tracepoint:syscalls:sys_enter_* { @[probe] = count(); }" --fetch-headers
Another problem is that --fetch-headers may pull the tar package from the external network. Depending on the network environment, the pull may fail. The solution is to pull it down in advance, and then refer to the official website to manually build an initContainer image, as shown in the following example.
k trace run node1 -e "tracepoint:syscalls:sys_enter_* { @[probe] = count(); }" --fetch-headers
k trace run -nkube-system pod/calico-kube-controllers-7d5d95c8c9-mkp54 -e "tracepoint:syscalls:sys_enter_* { @[probe] = count(); }" --fetch-headers --init-imagename=docker.4pd.io/tmp/kubectl-trace-init:5.15.4
kubectl trace pod
kubectl trace pod
k trace run -nkube-system pod/calico-kube-controllers-7d5d95c8c9-mkp54 -e "tracepoint:syscalls:sys_enter_* { @[probe] = count(); }" --fetch-headers
Future Step
Regardless of whether it is native eBPF, bcc, or bpftrace, there is actually a certain threshold when using it. Therefore, it is necessary to package the corresponding pins according to the actual situation according to the actual situation (or reuse the official tools and provide help documents) for development and use.
For example, the following scenarios can be pre-written to support scripts.
- mysql slow query
- fd leak
- Memory leak
- Frequent gc
- tcp packet loss
- DNS lookup failed
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。