dpdk应用场景系列01--rte_mbuf使用(网络包）

课前准备：
CPU、NUMA 节点、网卡（NIC）以及数据传输之间的关系。

+---------------------+                         +---------------------+
|                     |                         |                     |
|      NUMA Node 0    |                         |      NUMA Node 1    |
|                     |                         |                     |
| +-----------------+ |                         | +-----------------+ |
| | CPU 0           | |                         | | CPU 2           | |
| | +-------------+ | |                         | | +-------------+ | |
| | | Core 0      | | |                         | | | Core 2      | | |
| | +-------------+ | |                         | | +-------------+ | |
| | +-------------+ | |                         | | +-------------+ | |
| | | Core 1      | | |                         | | | Core 3      | | |
| | +-------------+ | |                         | | +-------------+ | |
| +-----------------+ |                         | +-----------------+ |
| +-----------------+ |                         | +-----------------+ |
| | Memory        | | |                         | | Memory        | | |
| |               | | |                         | |               | | |
| +-----------------+ |                         | +-----------------+ |
| +-----------------+ |                         | +-----------------+ |
| | NIC 0          |  |                         | | NIC 1          |  |
| | (Socket ID 0)  |  |                         | | (Socket ID 1)  |  |
| +-----------------+ |                         | +-----------------+ |
+---------------------+                         +---------------------+

CPU 和 Core：每个 NUMA 节点包含一个或多个 CPU，每个 CPU 又包含多个核心（Core）。例如，NUMA Node 0 包含 CPU 0 和 CPU 1，每个 CPU 又包含多个核心（Core 0 和 Core 1）。
内存：每个 NUMA 节点有其独立的内存。NUMA 节点内的 CPU 访问本地内存的速度通常比访问其他 NUMA 节点的内存快。
网卡（NIC）：每个 NUMA 节点可以连接一个或多个网络接口卡（NIC）。图中，NUMA Node 0 连接到 NIC 0，NUMA Node 1 连接到 NIC 1。

数据传输关系
本地数据传输：当 CPU 访问本地 NUMA 节点的内存时，延迟较低，性能较好。例如，CPU 0 访问 NUMA Node 0 的内存。
跨节点数据传输：当 CPU 访问其他 NUMA 节点的内存时，延迟较高，性能较差。例如，CPU 0 访问 NUMA Node 1 的内存。
网卡数据传输：网卡（NIC）通常与特定的 NUMA 节点绑定。为了优化性能，建议将数据包处理的线程绑定到与网卡同一 NUMA 节点的 CPU 上。例如，NIC 0 绑定到 NUMA Node 0，处理 NIC 0 数据包的线程最好运行在 NUMA Node 0 的 CPU 上。

案例

// Created by putao on 2024/8/30.
//
#include <rte_eal.h>
#include <rte_ethdev.h>
#include <rte_mbuf.h>

#define RX_RING_SIZE 1024
#define TX_RING_SIZE 1024
#define NUM_MBUFS (4096-1)
#define MBUF_CACHE_SIZE 250
#define BURST_SIZE 32

int main(int argc, char **argv) {
    struct rte_mempool *mbuf_pool;
    uint16_t port_id = 0;

    // 初始化 EAL
    rte_eal_init(argc, argv);

    // 创建 mbuf 内存池
    mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS,
                                        MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
                                        rte_socket_id());
    if (mbuf_pool == NULL)
        rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");

    // 配置以太网设备
    struct rte_eth_conf port_conf = {0};
    rte_eth_dev_configure(port_id, 1, 1, &port_conf);

    // 分配 RX 和 TX 队列
    rte_eth_rx_queue_setup(port_id, 0, RX_RING_SIZE, rte_eth_dev_socket_id(port_id), NULL, mbuf_pool);
    rte_eth_tx_queue_setup(port_id, 0, TX_RING_SIZE, rte_eth_dev_socket_id(port_id), NULL);

    // 启动以太网设备
    rte_eth_dev_start(port_id);

    // 接收和发送数据包
    struct rte_mbuf *bufs[BURST_SIZE];
    uint16_t nb_rx = rte_eth_rx_burst(port_id, 0, bufs, BURST_SIZE);
    //接受的量
    if (nb_rx > 0) {
        //使用 rte_eth_tx_burst 将接收到的数据包发送出去。
        uint16_t nb_tx = rte_eth_tx_burst(port_id, 0, bufs, nb_rx);
        if (nb_tx < nb_rx) {
            for (uint16_t i = nb_tx; i < nb_rx; i++) {
                //如果有未能发送的数据包，使用 rte_pktmbuf_free 释放这些数据包
                rte_pktmbuf_free(bufs[i]);
            }
        }
    }

    // 停止以太网设备
    rte_eth_dev_stop(port_id);
    rte_eth_dev_close(port_id);

    return 0;
}

为什么需要 RX 和 TX 队列?
RX 队列：用于接收数据包。网卡将接收到的数据包放入 RX 队列，应用程序从 RX 队列中读取数据包。
TX 队列：用于发送数据包。应用程序将要发送的数据包放入 TX 队列，网卡从 TX 队列中读取数据包并发送出去。
rte_eth_rx_burst 函数用于从指定的 RX 队列中接收一批数据包
端口 ID 和 Socket 关系？
端口 ID 是 DPDK 用来标识以太网设备的唯一标识符。端口 ID 通常从 0 开始，并根据系统中可用的网络接口逐一分配。
Socket 是指 CPU 的物理插槽。在多插槽系统中，不同的插槽可能连接到不同的内存区域。DPDK 使用 rte_socket_id 函数来获取当前线程所在的插槽 ID，以便进行内存分配和设备配置。
怎么获取端口 ID？
rte_eth_dev_count_avail 函数获取系统中可用的以太网设备数量，然后遍历这些设备以获取其端口 ID。

uint16_t nb_ports = rte_eth_dev_count_avail();
    printf("Number of available Ethernet ports: %u\n", nb_ports);

    for (uint16_t port_id = 0; port_id < nb_ports; port_id++) {
        struct rte_eth_dev_info dev_info;
        rte_eth_dev_info_get(port_id, &dev_info);
        printf("Port %u: %s\n", port_id, dev_info.driver_name);
    }

rte_socket_id 与rte_eth_dev_socket_id 区别？
rte_socket_id：返回当前执行线程所在的 NUMA 节点 ID，主要用于内存分配和线程相关操作。
rte_eth_dev_socket_id：返回指定以太网设备所在的 NUMA 节点 ID，主要用于网络设备配置和优化性能。

rte_mbuf 的结构

+---------------------+
|    rte_mbuf Header  |
|    (metadata)       |
+---------------------+
|    Private Data     |
+---------------------+
|    Data Buffer      |
|                     |
|  +-----------------+| <--- m->buf_addr
|  | Data            || <--- m->data_off
|  | (m->data_len)   ||
|  +-----------------+|
|  | Tailroom        || <--- rte_pktmbuf_tailroom(m)
|  +-----------------+|
+---------------------+

dpdk应用场景系列01--rte_mbuf使用(网络包）

putao

引用和评论

flink的窗口计算方式