Kubernetes observable practice based on eBPF

Author: Liu Yang (Yan Xun)

Observables are meant to solve problems, so before talking about observables, you should first understand the general principles of troubleshooting.

Background introduction

Troubleshooting principles

title=

Take troubleshooting system problems as an example. To understand the system, you must first pay attention to basic knowledge, understand basic computer science knowledge of programming languages, pay attention to the big picture of the system such as architecture deployment and major processes, pay attention to operation details, and understand the algorithms and data of core functions. The structure is clear, and you should also pay attention to the operation and maintenance tools of the system, and be able to understand the release, rollback and monitoring.

On the basis of understanding, it is also necessary to be able to reproduce the problem, mainly focusing on the triggering conditions of the problem and the retention of the data site when the problem occurs, including indicators, links, logs, events, etc.

Only with the site and the system can locate the problem. Correlation analysis is carried out through the data retained in the field; based on understanding, the root cause can be quickly located by dichotomy. In the process of positioning, pay particular attention to changes, because there are a large number of system problems caused by changes.

After determining the root cause, repair should be done to treat both the symptoms and the root cause, and it should be fully verified to ensure that no new problems are introduced.

The above are the general principles of troubleshooting, which are not only applicable to the troubleshooting of system problems, but can also be applied to all aspects of life.

Observables make the troubleshooting process more efficient, stable, and low-cost. It can help us understand the system, leave enough sites when there is a problem, can easily correlate data, help us do binary correlation analysis, and finally verify whether the repair is correct.

Kubernetes observable challenges

title=

The complexity is constant, it doesn't go away, it just shifts. We build programming languages, programming frameworks, container operating systems that just keep complexity where it is right. If everything works well, it's all good; when things go wrong, it's disaster. The downward trend in complexity puts observables under great pressure. The popularity of Kubernetes has made the microservice architecture very popular, and multi-language and multi-communication protocols have become the norm, which also brings challenges on the other hand.

Challenge 1: The complexity of end-to-end observation increases, and the cost of burying points remains high. However, this is only the tip of the iceberg. There are a lot of capabilities sinking to the Kubernetes control layer, container operation layer, network and operating system layers. The sinking of these infrastructure capabilities brings great challenges.

Challenge 2: Due to the separation of concerns, application problems and underlying problems cannot be related top-down.

Challenge 3: Although there are many tools, the context is missing and the data is scattered, which makes it impossible to understand the application well through these data, because the lack of the field cannot be correlated, and the troubleshooting efficiency is low.

Observability requires a unified technique to address its own complexity.

Sharing of observable practices based on eBPF

Introduction to eBPF

title=

The kernel has been an excellent location for observability since the beginning, however it has not been possible due to efficiency and security concerns. After years of development, eBPF technology has opened new doors for observability.

eBPF is a technology that can safely run sandboxed programs in the kernel without modifying the code to run when the kernel user mode program event occurs. It has the following characteristics:

Non-invasive features : The observation cost is extremely low, and the application does not need to modify any code or restart the process.

Dynamic programmability : The logic on the probe side can be modified by dynamically issuing eBPF scripts without restarting the probe.

High performance : Comes with JIT compilation, enabling probes to get the efficiency of running natively in the kernel.

Security : The verifier mechanism limits the kernel functions that eBPF scripts can access to ensure stable kernel operation.

In addition to these exciting features, eBPF is also very convenient to use. Taking monitoring, application, and performance as an example, you only need to load and compile the eBPF program to listen to the kernel events of the network, parse the network protocol, aggregate it into indicators, and output Trace.

eBPF has been supported by many large companies and has developed rapidly. Over the past year, the Alibaba Cloud observability team has built a unified observability platform based on eBPF technology. Its structure is as follows:

title=

Unified observable platform architecture based on eBPF

The bottom layer is the data collection layer, which mainly uses Tracepoints, Kprobre, and eBPF functions to capture related system calls, correlate process container information, and form original events, and supports multi-kernel versions through the combination of eBPF and sysdig. At the same time, in order to solve the problem of event explosion, event filtering and high-performance event transmission mechanisms are introduced.

Above is the data processing layer. After the user mode obtains the original event, it first parses the protocol, and generates data such as indicators, links, and logs, and the information is also converged in the process. Then fill in the meta information, such as K8s information filling or custom application information filling, and finally the monitoring data will be output through OpenTelemetry Collector. The introduction of OpenTelemetry Collector is mainly to support multiple data types and multiple data transmission channels, and to support writing monitoring data to user-specified storage.

Further up is the data storage layer. By default, metrics are stored in Prometheus using influxDB, and links and logs are stored in Trace using SLS.

The top is the data service layer, which finally presents a variety of observable services to users through the front end of ARMS and Grafana.

How to perform non-intrusive multilingual protocol parsing

title=

The ARMS observable team pays attention to the application of eBPF at the application layer. By monitoring the network kernel calls, building connection traces, analyzing the transmitted network packets, and obtaining the request response at the application layer, they can finally support requests in multi-language scenarios non-invasively. Monitoring of number, response time, number of errors, golden indicators.

Currently, we support HTTP, Redis, DNS, Kafka, MySQL, gRPC, http2 and other protocols, and the list of supported protocols is constantly expanding.

Online problems and solutions

title=

After more than a year of production practice, the most common problems encountered are the following four:

First, the kernel version adaptation problem . eBPF only has mature support in kernel version 4.14 and above. But there are still many old kernel versions online, and this part needs to be supported by sysdig. In the case of immature core, the higher version supports dynamic downloading of kernel map files and dynamic compilation.

Second, kernel events explode . The traditional monitoring of Tracepoints and Kprobre will generate huge events, which will put huge pressure on the performance of the probe. In order to solve this problem, we introduced an event filtering mechanism, which only handles network call events, and at the same time optimizes event transmission serialization to achieve high-performance event transmission.

Third, on the consumer side of events, protocol parsing is inefficient . To this end, we have optimized high-performance parsing algorithms, such as reducing the number of bytes analyzed, and optimizing more matching algorithms to improve parsing efficiency. At the same time, engineering methods such as multi-threaded memory multiplexing are used to improve the efficiency of protocol analysis.

Fourth, the indicator timeline exploded . All events will eventually be aggregated into metrics, links, and logs. The divergence of metrics will have a great impact on storage stability due to the divergence of individual dimensions. Therefore, we support dimension convergence when writing indicators. For example, the cardinality of each dimension must not exceed 100. After it exceeds, it will converge into an asterisk, which represents a general convergence mark. In addition, optimization is also performed on the query side, mainly to downgrade the accuracy.

Unified observable interface

Unified alarm

title=

The non-intrusive nature of eBPF technology and its multilingual support make it possible to use it out of the box. Based on this, the Alibaba Cloud observable team began to build a unified observable interface.

The first is unified alarm. To access Alibaba Cloud eBPF monitoring, we designed a set of default alarm templates, covering the application layer, K8s control layer, infrastructure layer, and cloud service layer, providing out-of-the-box capabilities to help users find problems.

Unified association analysis logic

title=

With eBPF to save on-site data, and the alarm system to inform that there is a problem, how to conduct a unified correlation analysis in the future to find the root cause?

We believe that there needs to be an interface to carry the association analysis logic. It should have clear goals, such as to solve capacity planning problems, cost consumption problems, or application performance problems; it should be rich in content, including all the content needed to solve the problem, such as indicators, links, logs, events, the impact of the problem, and the relationship etc.; it should have a very clear usage path, which can answer whether there is a problem at present, whether there is a problem in the future, what is the impact of the problem, what is the root cause of the problem, what the user can do, etc., so as to guide the user to solve the problem step by step .

Unify the Grafana market

title=

Based on the above assumptions, we have launched a unified Grafana market. It conforms to the logic of correlation analysis. It has an overview whether it is a global or a specific entity. It can discover the details of the problem and troubleshoot the problem. It includes multiple data sources such as logs, events, and indicators, and is driven by the alarm exception threshold. The entire dashboard can interact and click Jump, you can locate the root cause, covering the core resource types of the K8s cluster.

Unified topology

title=

We also launched a unified topology map, which has features such as topology awareness, dependency analysis, traffic monitoring, and contextual correlation. It can filter nodes and edges by dimension and build a business semantic view.

Demo demo: unified interactive page based on eBPF

title=

After the container service ACK enters a cluster, click O&M to enter the cluster topology function page. If the eBPF probe is not installed, you will be prompted to install it. After the installation is complete, you can get the traffic topology of the entire cluster out of the box.

The page contains the traffic relationship between deployment, deamonset, and statfulset. Click on a node to see the application performance it provides externally, and you can also view the upstream and downstream of the node. By looking at the upstream and downstream, you can quickly check whether it is operating according to the predetermined architecture.

In addition, you can also click on the side to view, for example, you can see the QPS and response time of MySQL.

title=

In addition to viewing indicators, you can also view details, such as viewing SQL statements and network time, such as how long it takes for the request to be sent to the peer, how long it takes for the peer to process, and how long it takes to download the response content, which can quickly locate the problem. At the same time, it also provides the ability of node filtering, which can quickly filter out the nodes that the user is interested in, and can also search for the corresponding nodes.

title=

Grafana's unified market is a 1+N model. 1 means that the global dashboard of the cluster provides an overview of the core resources of the entire cluster, including events. You can quickly view the number and details of various events, and you can check whether the node is healthy, whether the stateless application deployment is healthy, and stateful applications. Deamonset Wait.

title=

The structure of each resource-specific overview is also consistent, including "total" and "points". "Total" is a generalized summary of the entire cluster, which can quickly confirm whether there is a problem by thresholding, and the problematic thresholds will be marked with bright colors. For example, it can be seen from the above figure that the CPU request rate of one node is too high, and the specific node whose request rate is too high will be searched by "fen", and the problem node can be quickly found by sorting the request rate.

title=

The image above shows that there are two Pods that are not ready at the cluster level. Quickly get two Pods in pending state by sorting the phases. It can also be seen that 15 Pods have restarted behavior in the past 24 hours, and these Pods can be quickly found after sorting.

title=

You can click on a specific node to view the top 10 of its CPU request rate. After clicking to view the details, you can view it in the system resources to judge whether the request volume is reasonable and make corrections.

It can be seen that the Grafana market has strong interactive capabilities and logic.

title=

Each deployment or resource details page of the front-end application also has troubleshooting logic. The overview shows the management and control layer, CPU, network, memory, etc., you can know at a glance whether there is a problem with the system, and you can quickly check the problem.

title=

At the same time, the big disk also integrates logs and 7-layer application.

The above capabilities are all provided out-of-the-box based on the non-invasiveness of eBPF.

Summary and Outlook

Summarize

title=

Alibaba Cloud's observable team has built a unified monitoring system for kubernetes, which provides multi-language and application performance gold indicators non-invasively, supports multiple protocols, and combines Kubernetes management and control layer and network system layer monitoring to provide a full-stack integrated observable experience. Through the relationship between traffic topology, links, and resources, correlation analysis can be performed to further improve the efficiency of troubleshooting in the Kubernetes environment.

Outlook

title=

In the future, the Alibaba Cloud observable team will further explore the full coverage, non-intrusion, and programmable features of eBPF, and continue to make efforts in the following three aspects:

First, scalable APM, or eAPM for short. Continuously expand the boundaries around APM, solve the problem that each language needs to be buried separately, and solve the underlying black box problem that cannot be seen at the application level, including the following aspects:

Non-intrusive multilingual performance monitoring.
Non-intrusive distributed link tracing.
System and network analysis of application request granularity.

Second, provide tools optimized for development frameworks including tracing, profiling, dynamic network packet tracing, and kernel event processing in user mode.

Third, realize the enhanced built-in indicators, links and logs of eBPF, mainly including more support for application protocols and high-level system indicators and network indicators.

title=

Kubernetes observable practice based on eBPF

Background introduction

Troubleshooting principles

Kubernetes observable challenges

Sharing of observable practices based on eBPF

Introduction to eBPF

How to perform non-intrusive multilingual protocol parsing

Online problems and solutions

Unified observable interface

Unified alarm

Unified association analysis logic

Unify the Grafana market

Unified topology

Demo demo: unified interactive page based on eBPF

Summary and Outlook

Summarize

Outlook

阿里云云原生

引用和评论

Higress 入选全球 Top 100 MCP Servers 榜单｜MCPMarket.com

🔥吐血整理 Bolt.diy 部署与应用攻略

K8s 小白入门｜从电影配乐谈起，聊聊容器编排和 K8s

在 Kubernetes 上用 KubeBlocks + Dify 快速构建生产级 AIGC 应用

支付宝H5下载被拦截的原因排查与解决指南

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了

PostgreSQL@K8s 性能优化记