在这里插入图片描述 The core of eBPF (Extended Berkeley Packet Filter) is an efficient virtual machine that resides in the kernel. The original purpose is an efficient network filtering framework, formerly known as BPF, so we first understand BPF

BPF

frame

在这里插入图片描述
The above picture is the location and framework of BPF. It should be noted that the kernel and user use buffers to transmit data to avoid frequent context switching. The BPF virtual machine is very simple and consists of an accumulator, index register, storage, and implicit program counter.

Example

Next, let's take a look at an example to filter all ip packets, you can use tcpdump -d ip to view:


(000) ldh      [12]                             // 链路层第12字节的数据加载到寄存器,ethertype字段
(001) jeq      #0x800           jt 2    jf 3    // 比较寄存器的ethertype字段是否为IP类型,true跳到2,false跳到3
(002) ret      #65535                           // 返回true
(003) ret      #0                               // 返回0

BPF only uses 4 virtual machine instructions to provide very useful IP packet filtering.


tcpdump -d tcp

(000) ldh      [12]                             // 链路层第12字节的数据(2字节)加载到寄存器,ethertype字段
(001) jeq      #0x86dd          jt 2    jf 7    // 判断是否为IPv6类型,true跳到2,false跳到7
(002) ldb      [20]                             // 链路层第20字节的数据(1字节)加载到寄存器,IPv6的next header字段
(003) jeq      #0x6             jt 10    jf 4    // 判断是否为TCP,true跳到10,false跳到4
(004) jeq      #0x2c            jt 5    jf 11   // 可能是IPv6分片标志,true跳到5,false跳到11
(005) ldb      [54]                             // 我编不下去了...
(006) jeq      #0x6             jt 10    jf 11   // 判断是否为TCP,true跳到10,false跳到11
(007) jeq      #0x800           jt 8    jf 11   // 判断是否为IP类型,true跳到8,false跳到11
(008) ldb      [23]                             // 链路层第23字节的数据(1字节)加载到寄存器,next proto字段
(009) jeq      #0x6             jt 10    jf 11   // 判断是否为TCP,true跳到10,false跳到11
(010) ret      #65535                           // 返回true
(011) ret      #0                               // 返回0

The above is freebsd's BPF. It should not be called LSF in Linux. Let's see for yourself.

eBPF

First acquaintance with eBPF

Linux kernel version 3.18 began to include eBPF. Compared with BPF, some important improvements have been made. First, efficiency is due to the compilation of eBPF code by JIB; second is the scope of application, which extends from network packets to general event processing; finally, it is no longer used Socket, use map for efficient data storage.

Based on the above improvements, the kernel developers have made network monitoring, speed limit, and system monitoring in less than two and a half years.

Currently eBPF can be decomposed into three processes:

  • Create eBPF programs in the form of bytecode. Write C code to compile LLVM into eBPF bytecode residing in ELF file.
  • Load the program into the kernel and create the necessary eBPF-maps. eBPF can be used as socket filter, kprobe processor, flow control scheduling, flow control operation, tracepoint processing, eXpress Data
    Path (XDP), performance monitoring, cgroup limitation, lightweight tunnel program type.
  • Attach the loaded program to the system. Attach to different kernel systems according to different program types. When the program is running, start the state and begin to filter, analyze or capture information.

At the NetDev 1.2 conference in October 2016, Jakub Kicinski and Nic Viljoen of Netronome published the title "eBPF/XDP Hardware Offload to SmartNIC". Nic Viljoen introduced in it that each FPC on the Netronome SmartNIC reaches 3 million packets per second, and each SmartNIC has 72 to 120 FPCs, which may support up to 4.3 Tbps of eBPF throughput! (In theory)

eBPF entrance

Next, we take the kernel version 4.14 as an example to check.

bpf system call

kernel/bpf/syscall.c

The header file of the bpf system call

include/uapi/linux/bpf.h

Entry function

int bpf(int cmd, union bpf_attr *attr, unsigned int size);

kernel/bpf/syscall.c from the macro definition in 0610e52a277839.

eBPF commands

There are 10 commands for the BPF system call of the Linux system, 6 of which are listed in the man page:

  • BPF_PROG_LOAD verifies and loads the eBPF program, and returns a new file descriptor.
  • BPF_MAP_CREATE creates a map and returns a file descriptor pointing to the map
  • BPF_MAP_LOOKUP_ELEM Find the element from the specified map by key and return the value value
  • BPF_MAP_UPDATE_ELEM Create or update elements in the specified map (key/value pairing)
  • BPF_MAP_DELETE_ELEM find the element from the specified map by key and delete it
  • BPF_MAP_GET_NEXT_KEY Find the element from the specified map by key, and return the next key value

The above commands can be divided into two categories, loading eBPF programs and eBPF-maps operations. The eBPF-maps operation has great autonomy. It is used to create eBPF-maps, find, update and delete elements from it, and traverse eBPF-maps (BPF_MAP_GET_NEXT_KEY)

Next, list the remaining 4 commands, which can be seen in the code:

  • BPF_OBJ_PIN is newly added in version 4.4 and belongs to persistent eBPF. With this, eBPF-maps and eBPF programs can be placed in /sys/fs/bpf
  • BPF_OBJ_GET is the same as above. Before this, there is no tool to create eBPF programs and end, because the filter will be destroyed, and the file system can still retain eBPF-maps and eBPF programs after the program that created them exits.
  • BPF_PROG_ATTACH added in version 4.10, attach the eBPF program to the cgroup, which is applicable to the container
  • BPF_PROG_DETACH Same as above.

eBPF-map type

There are 10 commands for the BPF system call of the Linux system, 6 of which are listed in the man page:

BPF_PROG_LOAD verifies and loads the eBPF program, and returns a new file descriptor.

BPF_MAP_CREATE creates a map and returns a file descriptor pointing to the map

BPF_MAP_LOOKUP_ELEM Find the element from the specified map by key and return the value value

BPF_MAP_UPDATE_ELEM Create or update elements in the specified map (key/value pairing)

BPF_MAP_DELETE_ELEM Find the element from the specified map by key and delete it

BPF_MAP_GET_NEXT_KEY Find the element from the specified map by key, and return the next key value

The above commands can be divided into two categories, loading eBPF programs and eBPF-maps operations. The eBPF-maps operation has great autonomy. It is used to create eBPF-maps, find, update and delete elements from it, and traverse eBPF-maps (BPF_MAP_GET_NEXT_KEY)

Next, list the remaining 4 commands, which can be seen in the code:

  • BPF_OBJ_PIN is newly added in version 4.4 and belongs to persistent eBPF. With this, eBPF-maps and eBPF programs can be placed in /sys/fs/bpf
  • BPF_OBJ_GET is the same as above. Before this, there is no tool to create eBPF programs and end, because the filter will be destroyed, and the file system can still retain eBPF-maps and eBPF programs after the program that created them exits.
  • BPF_PROG_ATTACH added in version 4.10, attach the eBPF program to the cgroup, which is applicable to the container
  • BPF_PROG_DETACH Same as above.

eBPF-map type

BPF_MAP_TYPE_UNSPEC
  • BPF_MAP_TYPE_HASH eBPF-maps hash table, one of the first two methods mainly used
  • BPF_MAP_TYPE_ARRAY similar to the above, except that the index is like an array
  • BPF_MAP_TYPE_PROG_ARRAY saves the value of the file descriptor of the loaded eBPF program. It is commonly used to use numbers to identify different eBPF program types. You can also find the eBPF program from eBPF-maps with a given key value and jump to the program.
  • BPF_MAP_TYPE_PERF_EVENT_ARRAY cooperates with perf tools, CPU performance counters, tracepoints, kprobes and uprobes. You can view tracex6_kern.c, tracex6_user.c, tracex6_kern.c, tracex6_user.c under the path samples/bpf/
  • BPF_MAP_TYPE_PERCPU_HASH same as BPF_MAP_TYPE_HASH, except that it is created for each CPU
  • BPF_MAP_TYPE_PERCPU_ARRAY is the same as BPF_MAP_TYPE_ARRAY, except that it is created for each CPU

BPF_MAP_TYPE_STACK_TRACE used to store stack-traces

BPF_MAP_TYPE_CGROUP_ARRAY Check the croup attribution of skb

BPF_MAP_TYPE_LRU_HASH
BPF_MAP_TYPE_LRU_PERCPU_HASH
  • BPF_MAP_TYPE_LPM_TRIE most professional usage, a trie of LPM (Longest Prefix Match)
  • BPF_MAP_TYPE_ARRAY_OF_MAPS may be for each port
  • BPF_MAP_TYPE_HASH_OF_MAPS may be for each port
  • BPF_MAP_TYPE_DEVMAP may be directed to the dev
  • BPF_MAP_TYPE_SOCKMAP may be connected to the socket

请添加图片描述


代码熬夜敲
210 声望354 粉丝

李志宽、前百创作者、渗透测试专家、闷骚男一位、有自己的摇滚乐队