Click "Cloud Recommended Big Coffee" with one click, to get the official recommended boutique content, and not to get lost in learning technology!

7姜凤波.jpg

The Internet has experienced explosive growth in the past few decades, posing challenges to network performance and infrastructure. Users have demand for high-performance network processing and user-mode protocol stacks. Therefore, Tencent Cloud has created an independent universal protocol stack through DPDK+ user-mode protocol stacks. Network architecture F-Stack to improve network performance. The article will introduce the design concept and application method of F-stack in detail.
213.png

In the past few decades, the Internet has shown explosive growth. The abundance of content and endless DDoS attacks have posed great challenges to network performance and also promoted the rapid development of network infrastructure. The bandwidth of operators is getting bigger and bigger, and the performance of hardware such as CPU/network card will get stronger and stronger. However, for a long time, the performance improvement of software lags behind that of hardware and severely limits the performance of application programs. Most of the time, it has to rely on heap machines to deal with it, resulting in a lot of waste of resources and increased costs.

With the continuous development of software, in the first 10 years of the new century, the problem of C10K was solved through multi-threading and event-driven (kqueue/epoll, etc.). However, it was overwhelmed in the second decade, and new solutions were urgently needed to cope with the growth of network traffic.

For example, the HttpDNS service provided by Tencent Cloud doubles the number of requests every few months, and there is a strong demand for high-performance network processing and user-mode protocol stacks. The kernel protocol stack used in the early days of HttpDNS can only provide TCP short connection services with a single machine less than 100,000 QPS. With the progress and development of technology, such as REUSEPORT, the follow-up kernel protocol stack can also achieve hundreds of thousands of QPS, but there is still a very large horizontal expansion bottleneck. Based on this bottleneck, Tencent Cloud urgently needs a high-performance network service framework, so it chose to use DPDK+ user mode protocol stack to perform kernel bypass to improve network performance.

In his 2013 speech by Robert David Graham on C10M, the main point of view on how to achieve tens of millions of concurrent connections is that the kernel is the problem that hinders performance improvement. We should bypass the kernel (kernel by pass, kernel bypass) and a large number of Other technical optimizations, such as polling, zero copy, hugepage, etc.

The subsequent introduction of eBPF and XDP in the Linux kernel can also greatly improve network performance, but the essence of improving performance is still to bypass the kernel. It has not yet caused a substantial impact on the Intel DPDK ecosystem, especially for high-kernel versions and network card drivers. Dependence severely limits the use and promotion in enterprises.

Before this speech, related technologies have been applied to a certain extent, such as PF_RING, Netmap, IntelDPDK and other data drivers mentioned in the speech. Tencent Cloud DNSPod has completed the research and selection of related software and hardware in 2012. And finally chose DPDK (not open sourced at this time) to achieve a new generation of authoritative DNS server to achieve the performance of a single 10GE 11 million QPS, which greatly improves the conventional resolution and anti-attack capabilities of DNS. But it is true that related technologies have not been developed and applied on a large scale in the industry until the presentation of the speech, especially DPDK, which stands out from it, has almost become the standard configuration of high-performance network programs. And we also extracted the network module using DPDK in the authoritative DNS separately as an independent general network framework in 16 years, which can be reused in multiple services to improve network performance, which is now F-Stack.

F-Stack introduction and technical characteristics

F-Stack is a full-user-mode high-performance network access development kit, based on DPDK, FreeBSD protocol stack, micro-thread interface, etc., users only need to pay attention to business logic, simple access to F-Stack can achieve high-performance Network Server. Bypassing network packets from the kernel to the application layer for processing, although the network performance is greatly improved, the kernel's network protocol stack can no longer be used. This has little effect on applications below layer 4 and simple UDP layer 7 but has other effects For the 7-layer application, a mature user mode protocol stack is necessary, so F-Stack is the solution given by Tencent Cloud DNSPod.

F-Stack is a basically complete network programming framework, which is equivalent to using glue to glue the DPDK network I/O module, FreeBSD user mode protocol stack, POSIX Like API, asynchronous programming interface, some upper-level applications, etc., for users to access and use .
Using pure C development (some third-party components use C++, F-Stack is encapsulated), easy to use, but also requires users to have a certain foundation of DPDK use. Using the BSD 2-Clause open source protocol, it is very friendly to commercial use. What are the technical characteristics of F-Stack? The introduction will continue next.

  • Multi-process architecture, polling mode

image.png

Here is a basic architecture of F-Stack. It adopts a multi-process model and full user mode. Each process is bound to a CPU core and network card transceiver queue. It has better memory locality, avoids cache invalidation, and uses internal processes. Polling mode, no lock, no scheduling, no context switching.

F-Stack currently adopts a multi-process architecture. Each process has its own process independent protocol stack, application interface and application layer business logic, avoiding multiple performance bottlenecks of the kernel, no data sharing between each process, and very good horizontal scalability. .

  • DPDK Development Kit

DPDK is a widely used data plane development kit, so I won’t introduce itself too much here.

In addition to the 16.07 version used in the initial open source version, F-Stack's selection of DPDK version will soon upgrade and keep using the DPDK LTS version (xx.11) version, but generally will be several after the latest LTS version is released Upgrade support will be carried out on the dev branch in December, and will be officially released later (usually about 1 year). For example, the current F-Stack's main stable version 1.20 and 1.21 use DPDK 18.11.x and 19.11.x versions, respectively. The 20.11.x version is supported in the development branch.

  • FreeBSD protocol stack

F-Stack actually has a lot of thoughts and attempts behind choosing the FreeBSD protocol stack for user mode transplantation. Here are just a few advantages of the FreeBSD protocol stack. For more information, please refer to the F-Stack background story. Get to know.

  1. The protocol stack is fully functional, and there are a large number of tools to debug and analyze the network, such as sysctl, ifconfig, netstat, netgraph, ipfw, ndp, etc.
  2. You can follow up on the improvement of the community, no need to develop and maintain by yourself. There is original user mode transplantation for reference, which greatly reduces the workload. See libplebnet and libuinet.
  3. Compared with the complicated implementation of the Linux protocol stack, the FreeBSD code is clearer and easier to understand; Linux follows the GPL protocol as open source, which may restrict the use of some users.

F-Stack's current release version is based on FreeBSD releng 11.0 version, and some subsequent versions of the patch are transplanted, which are fully functional but also redundant (some modules are removed but not compiled into F-Stack, such as SCTP, IPSEC, etc.), debugging and analysis tools Perfect and stable operation. Follow-up will be upgraded to FreeBSD releng 13.0 version, and will continue to follow up major improvements in the community.

  • POSIX compatible interface

F-Stack provides a POSIX like interface with a prefix of "ff_", such as "ff_socket", "ff_bind", etc., and provides an "ff_kqueue" event-driven interface and also encapsulates the "ff_epoll" interface based on kqueue, except for the "ff_epoll" interface The usage is slightly different from the Linux system interface, and the usage of other interfaces is completely compatible, and the existing programs can be accessed by simple changes.

It should be noted that although the interface usage is fully compatible, because many tags are defined differently in Linux and FreeBSD systems, the F-Stack interface will undergo definition conversions, but 100% support is not guaranteed, especially for subsequent new ones. The added mark definition also needs to be continuously updated and maintained.

The POSIX like interface is friendly to the transplantation of original applications, and it is safe to use, but because of the memory copy involved, the performance is not optimal. F-Stack will also provide a set of independent zero-copy APIs. It is available for users in need.

  • Microthreading framework

The F-Stack application must use the asynchronous mode interface for programming, but it also provides a microthread (coroutine) framework for users to perform synchronous programming and asynchronous execution.

The microthreading framework uses a part of micro_thread in MSEC, which is also open sourced by Tencent. It is important to note that the open source agreement of the microthreading module is GPL-2.0, which is not the main core module of F-Stack. It is an open source agreement for the main body of F-Stack. There is no impact, but if users use the micro_thread module for application development, they need to pay attention to the possible impact of the open source agreement.

  • Application porting

F-Stack currently provides a way to access the lib library, which needs to be compiled and packaged together with business applications, and directly provides transplanted Nginx and Redis applications for users to use directly.

For some original multi-threaded applications, especially when there is resource sharing, in order to achieve better performance and horizontal scalability, our suggestion is to split and reduce resource sharing as much as possible. If it is impossible to split, F-Stack will also consider providing independent network I/O and protocol stack modules in the future, but performance degradation will also be inevitable.

  • Applicable scene

Here we first look at a performance comparison between Nginx using F-Stack and the kernel protocol stack, which are short links and long links. It should be noted that the kernel protocol stack is also the test data after various tunings, such as network cards. CPU affinity binding of queues and workers, enabling REUSEPORT and optimization and adjustment of other kernel network parameters.

image.png

Here F-Stack has significantly improved the kernel protocol stack, but the improvement of short links after more than 12 cores is particularly obvious. F-Stack has good performance optimization and use value for most high-concurrency network application scenarios. Among them, the most suitable is the super-concurrent TCP short link business scenario, which is also the main business scenario of our HttpDNS.

Of course, if you want a comprehensive understanding of F-Stack's business applications, you must look at it from the beginning of its development history.

F-Stack development history

At present, the open source F-Stack is version 3.0. Version 1.0 is the authoritative DNS of DNSPod in 12-13. When DPDK is selected to improve performance, it is a simple user-mode TCP protocol stack to support TCP DNS. It has been online since 13 years. It continues to run online and has been upgraded to 3.0 in the past two years.

In order to support the rapid development of DNS services, a high-performance user-mode protocol stack is indispensable, and maintaining a fully functional TCP protocol stack requires a lot of energy. This is also a very important reason for the development of F-Stack 2.0 and 3.0.

In 16 years, under the decision of the leader at the time, we gave up continuing to maintain the 1.0 protocol stack, and chose the open source protocol stack to adapt and upgrade and open source to the outside world. Through research, we first chose seastar (excluding MTCP, LwIP, etc.), and in the same year Made version 2.0, and also made some application adaptations, such as HttpDNS, Tencent Cloud Dynamic Acceleration CDN (DSA, now merged into the site-wide acceleration ECDN), etc., but the ideal is beautiful, the reality is cruel, although based on F -The Stack2.0 version of HttpDNS performed perfectly in the laboratory, with excellent performance and extensible plug-in architecture. However, it stepped on countless pits during a small number of gray-scale operations on the live network. This is related to the usage scenarios of Seastar itself. As a component of ScyllaDB, its main application scenario is in the internal network, and it is not well adapted to the complex network environment of the external network.

After the team filled in a lot of pits and submitted multiple Pull Requests to Seastar, we found that we were caught in the 1.0 version cycle again, so after a period of persistence, we abandoned Seastar and switched to the more mature Linux and FreeBSD protocol stacks. I chose FreeBSD to develop F-Stack 3.0, which is the current open source version. Of course, the F-Stack 2.0 framework has not been completely abandoned. Although it is not satisfied with the HttpDNS that mainly serves the external network, it has been running for many years before upgrading in the CDN dynamic acceleration DSA with the acceleration of intranet interconnection as the main scenario. .

In the first half of 2017, we developed F-Stack 3.0 based on the DPDK and FreeBSD protocol stacks, and open sourced it to the outside world, and quickly re-adapted HttpDNS. Because the request volume of HttpDNS has been growing rapidly and the business performance pressure is very high, it is prioritized Adapted to HttpDNS, and gradually went online to provide services to the outside world. Although some problems were encountered in the follow-up, they were quickly optimized and stabilized. So far, it has supported HttpDNS requests with a daily request volume of 1 trillion and maintained 10 times.

  • At present, F-Stack has been under continuous maintenance. It is expected that version 1.22 will be released from the end of 2021 to early 2022, which may include the following new features
  • DPDK 20.11, the dev branch has been upgraded to support it. Compared with 19.11 before, there is a big difference in the way of compilation and usage. Only meson/ninja is supported for compilation.
  • FreeBSD 13.0, the dev branch has been upgraded and supported, but it is not yet fully stable. There are still some problems, such as BBR/RACK not working properly, some problems with multi-process performance to be optimized, and some functions of some tools are abnormal (such as ff_netstat on the listening port Views, etc.), further debugging and optimization are needed.
  • New zero-copy interface support.
  • One-click migration support for original applications, independent network I/O and protocol stack modules provided, similar to LD_PRELOAD or other methods to simplify application migration thresholds, but it will definitely lead to performance degradation.
  • Nginx-1.20 support.
  • Redis 6 support.
  • The default method of receiving network card distribution is changed from RSS to Flow Director, but the existing default RSS policy is still maintained.

[Note] The above functions will be adjusted according to the specific time schedule. Some functions may not be included in the 1.22 release, and will be postponed to subsequent versions for support.

F-Stack practice case

Since F-Stack has been open sourced, it has been affirmed by a large number of research institutions, universities, and companies around the world. It is used for technical research or online commercialization projects. Here, I will only list the actual network business practices of F-Stack users. Case.

  • Tencent Cloud HttpDNS

HttpDNS service is mainly used for mobile apps to solve the problems of default DNS resolution failures, cross-network resolution results, and resolution hijacking. At present, most of the major TOP APPs use this kind of technology, and Tencent Cloud DNSPod was the first to launch Commercial HttpDNS service currently serves a large number of users, with a daily request volume of trillions. For historical version introduction, please refer to the article "How the 100-billion-level HttpDNS service is made" on the official account "Goose Factory Internet Affairs". Of course, the current latest HttpDNS Several versions have been updated iteratively. The new professional version supports more features, such as IPv6, DNSPod authoritative data push, user-defined domain name resolution, and dangerous domain name interception (user-defined whether to enable and which types of dangers are blocked A series of functions such as domain name), black and white lists, and request statistics are also built on the F-Stack infrastructure.

  • DNSPod authoritative DNS

As the parent project of F-Stack, DNSPod authoritative DNS provides authoritative resolution services for nearly tens of millions of domain names. Benefiting from F-Stack's high-performance network services, the latest version of authoritative DNS has reached 100 million QPS per server on 100G models. For details, see my previous article "Performance Optimization Practice for 100 Million QPS of Authoritative DNS Single Server Based on F-Stack". At present, the capacity on the DNSPod bus has reached billions of QPS, combined with Tencent Group’s global deployment of large-bandwidth nodes With advanced protection equipment and algorithms, DNSPod has successfully defended against DDoS attacks above TB level many times without the customer’s perception. 5T.

Introduction to other user mode protocol stacks

VPP

VPP is led by Cisco and participated by many major manufacturers. Its user mode protocol stack Host Stack is developed from Cisco switch protocol stack. The open source time is later than F-Stack, but it is currently the most active user mode protocol stack in the community.

MTCP

MTCP Stack comes from KAIST in South Korea and is widely used in the industry. The main problem is that it only supports TCP as its name suggests.

Seastar

As a sub-project of ScyllaDB, Seastar's Native stack has a good performance on the intranet and is used more in intranet scenarios.

LwIP

LwIP comes from the Swedish Academy of Computer Science, a lightweight protocol stack, mainly used in embedded systems, etc., but there are also many manufacturers based on LwIP to modify and transplant to support their own applications.

213.png
7姜凤波.jpg

"Yunjian Big Coffee" is a special column for Tencent's cloud plus community. Cloud recommendation officials specially invite industry leaders to focus on the implementation of cutting-edge technologies and theoretical practice, and continue to interpret hot technologies in the cloud era and explore new opportunities for industry development. Click one-click to subscribe to , and we will regularly push premium content for you.

腾讯云开发者
21.9k 声望17.3k 粉丝