Introduction to Kubernetes monitoring is based on the Kubernetes container interface and underlying operating system under application monitoring. It is an integrated solution for the end-to-end observability of the Kubernetes cluster software stack. In Kubernetes monitoring, all related layers can be seen at the same time Observation data. We hope that through a series of best practices in Kubernetes monitoring, everyone can use Kubernetes monitoring to solve difficult observable problems in the Kubernetes environment.
Hello everyone, I am Yan Xun of Alibaba Cloud Cloud Native Application Platform, and I am very happy to be able to communicate with you in the Kubernetes monitoring series of public courses. This open class is expected to bring new solutions to everyone's rapid discovery and positioning of problems in the Kubernetes containerized environment.
Why do we need Kubernetes monitoring?
Many students are not unfamiliar with application performance monitoring. This type of monitoring mainly focuses on business application logic, application framework and language runtime. Monitoring objects include full thread pool, unavailable database connection, MySQL, memory overflow, and various call chains. Exception stack, etc. With the evolution of cloud native technology brought about by Kubernetes containerization technology, the development and operation and maintenance of upper-layer applications have become simpler, but the complexity is constant, and the reduction in the complexity of the upper layer will inevitably be accompanied by the increase in the complexity of the bottom layer. As shown in the figure below, the complexity gradually shifts to the container virtualization layer and the system call kernel layer to support various virtualization technologies. Each layer may have problems, and these problems will affect the upper-layer applications. For example, if the Kubernetes component of the container virtualization layer is abnormal, if the scheduler is abnormal, the Pod will not be able to schedule and affect the application; for example, the file system-related system call is abnormal, and the upper application cannot read the file, causing application problems; for example, the kernel is abnormal and the application process cannot be scheduled. Finish the work.
For applications to run in a healthy and stable manner, what is needed is the end-to-end health and stability of the software stack. Although many operation and maintenance teams have built application monitoring and system monitoring systems, none of them can be connected from top to bottom and end to end. When the behavior of the software at all levels leads to thorny problems, there is no way to deal with them. At the application layer, a network request timed out. It seems that there is no problem on the client and server, but in fact the network layer packet sends RTT too high, the retransmission rate is too high, or DNS resolution is slow, or it is CNI The plugin is slow. How to achieve end-to-end observability in the Kubernetes containerized environment is the meaning of Kubernetes monitoring.
Kubernetes monitoring is based on the Kubernetes container interface and underlying operating system under application monitoring. At the container virtualization layer, we obtain observation data through the following five data sources, obtain observation data of Kubernetes control components through Kubernetes control component exporter; obtain observation data of container resources through cAdvisor; obtain Kubernetes resource observation data through kube-state-metrics Status data, as well as the status and condition data of events and Kubernetes resources. At the system call layer, we obtain observation data through Linux tracing technologies such as Kprobe/tracepoints; at the kernel layer, we obtain observation data through the kernel observable module, and then Kubernetes monitoring is upwardly correlated through the association between processes, containers, Kubernetes resources, and business applications Open up application performance monitoring to create end-to-end observability. Therefore, Kubernetes monitoring is an integrated solution for the end-to-end observability of the Kubernetes cluster software stack. In the Kubernetes monitoring, you can see the observation data of all associated layers at the same time. We hope that through a series of best practices in Kubernetes monitoring, everyone can use Kubernetes monitoring to solve difficult observable problems in the Kubernetes environment.
We will also explain from two types. The first type is the discovery of problems, which mainly includes the discovery of five types of problems: application architecture problems, performance problems, resource problems, scheduling problems, and network problems. The second category is positioning problems, which mainly include locating the root cause of the problems found in the above five categories and providing repair suggestions.
Explore application architecture and discover unexpected traffic
The theme of the first lesson of the Kubernetes monitoring series of open courses is "How to use Kubernetes monitoring to explore the application architecture and discover unexpected traffic", including the following three points:
- Background introduction: Challenges of application architecture exploration;
- Typical scenarios: In which scenarios, we need to explore the application architecture;
- Best practice: Introduce a mode of application architecture exploration to efficiently discover positioning problems.
1. Challenges of application architecture exploration
(1) Chaos microservice architecture
In the Kubernetes containerized environment, the microservice architecture is the most common architectural pattern. Under this architecture, as the business develops, there will be more and more microservices, and the relationship between them will become more and more complicated. In the case of increasing complexity, some common architectural problems become difficult, such as what is the current operating architecture of the application, whether the downstream application depends on the service is normal, whether the upstream client traffic of the application is normal, whether the DNS resolution of the application is normal, two Whether there is a problem with the connectivity between applications, etc. Therefore, it often becomes very difficult for us to explore the application architecture.
(2) Multilingual
In the microservice architecture, each microservice can usually use different programming languages, as long as the standard service is exposed. So how to monitor different languages, are there the same embedding modes, and whether there are easy-to-use and efficient embedding tools for the corresponding languages? What impact does code intrusion have on performance? Does burying some code affect business operations? This is an observation problem faced in multilingual scenarios.
(3) Multiple communication protocols
In the microservice architecture, the communication between each microservice can use different communication protocols, such as HTTP, gRPC, Kafka, Dubbo, etc. We often need to identify these protocols in order to quickly find the corresponding service-dependent problems, but identifying the protocol means understanding Each protocol needs to be buried in appropriate places. How to bury the code in a unified way for different communication protocols and whether it will affect business performance? This is an observation problem faced in the communication protocol scenario.
Two, typical scene
(1) Architecture perception
Architecture awareness is based on real network calls, using microservices as nodes and calls between microservices as edges to draw a topology map. By comparing the expected architecture of the static design, we can find out the problems, such as whether there are more or less microservices, whether the relationship between the microservices is correct, usually when new applications are launched, new regions are opened, and the overall link The use of scenes that need to pay attention to the structure of the big picture, such as combing.
(2) Architectural abnormality found
Architectural abnormality discovery refers to displaying the corresponding abnormal colors through the abnormal rules of the nodes and edges in the custom architectural topology graph, which can quickly find abnormal nodes and edges, usually in the overall link combing and health inspections and other attention to the state of the nodes and edges Used in the scene.
(3) Association analysis
After locating a certain node or edge abnormality through anomaly discovery, we usually need to switch the association relationship to quickly view the upstream and downstream of the related node or edge and the corresponding own service instance, step by step to narrow the scope of the problem.
Three, best practices
The above three typical scenarios constitute a complete practical process: Observe whether the actual operating architecture of the application is consistent with expectations through architecture perception. If there are structural problems, we need to further investigate the abnormal structure of the service. If there are no structural problems, we can proceed to the next step. . Observe whether there are abnormally colored nodes and edges through anomalies. If there are no abnormal nodes and edges, it is best. Otherwise, we will proceed to the next step. After locating specific nodes and edges, we will start the association analysis and analyze whether our own instance is first. If there is a problem, look at whether there is a problem with the upstream and downstream.
How does Kubernetes monitoring support best practices? The first is the architectural awareness of Kubernetes to monitor the cluster topology. Kubernetes monitoring maps the application architecture topology by correlating real network requests. Currently, there are two views: Service and Workload. The former is the service call between Services, and the latter is the service call between Deployment, Daemonset, and Statefulset.
Enter the topology diagram, the nodes are grouped and converged by default, the cluster is grouped by namespace, and the cluster is grouped by service type. After expanding the grouping, you can see the corresponding nodes and node relationships. Click on the node to see the aggregated value and timing value of the performance indicators in the selected time range. These values will be divided according to the network protocol. Click on the edge to see the selected time range The aggregated value and timing value of the performance indicators in the network. These values will be divided according to the network protocol, and then filtered with nodes, such as viewing the architectural relationship of two specific namespaces, and node query, and quickly viewing a node, which can be a good structure for the structure. Explore.
Looking at the abnormal detection capabilities of Kubernetes monitoring, Kubernetes monitoring uses three dimensions of abnormal conditions to draw nodes and edges into abnormal yellow or red colors. Specifically, these three dimensions are abnormal performance indicators, such as error rate greater than 10%, and average response time greater than 500 milliseconds; second, abnormal resource indicators, such as CPU usage greater than 70%, memory usage greater than 70%; Third, the K8S control status is abnormal. For example, the POD has not been able to reach the ready state. When the group is collapsed, the abnormal proportion of the node group will be displayed. Expand the group and you can see that a specific node becomes abnormal. Through this capability, we can quickly discover the abnormality of a specific microservice or microservice relationship.
Kubernetes monitoring also has correlation analysis capabilities, supports viewing the upstream and downstream of a specific node, and provides a 3D view to view the upstream and downstream relationship of the node and its own strength status at the same time. All associated data can be explored in one graph, which greatly improves problem positioning s efficiency.
Fourth, the product value of Kubernetes monitoring
Alibaba Cloud Kubernetes Monitoring is a one-stop observability product developed for Kubernetes clusters. It associates all indicators, links, logs, and events under the name of Kubernetes. Mainly has six characteristics:
- No code intrusion: Alibaba Cloud Kubernetes monitoring uses bypass technology to obtain rich network performance data without burying the code.
- Language-independent: Alibaba Cloud Kubernetes monitors network protocol analysis at the kernel layer, and supports any language and any framework.
- High performance: Alibaba Cloud Kubernetes monitoring is based on eBPF technology, which can obtain rich network performance data with extremely low consumption.
- Resource association: Alibaba Cloud Kubernetes monitors the association of related resources through network topology and resource topology.
- Data diversity: Alibaba Cloud Kubernetes monitoring supports various types of observable data (monitoring indicators, links, logs, and events), covering the end-to-end software stack.
- Integrity: Alibaba Cloud Kubernetes monitors the scene design through the console, correlates architecture-aware topology, application monitoring, Prometheus monitoring, cloud dial testing, health inspection, event center, log service, and cloud service.
So what are the similarities and differences between Kubernetes monitoring, application performance monitoring, and Prometheus monitoring? The following figure clearly expresses the relationship and difference between the three. Application performance monitoring mainly focuses on application logic, frameworks and programming languages, while Kubernetes monitoring focuses on system networks and container interfaces, and at the same time correlates application performance monitoring upwards. Prometheus monitoring is the infrastructure. The metric data of Kubernetes monitoring and application performance monitoring will be stored in Prometheus monitoring.
So, if you want to quickly solve the Kubernetes monitoring problem, start a trial now! Currently Kubernetes monitoring is in full free public beta, click the link ( https://www.aliyun.com/activity/middleware/container-monitoring?spm=5176.20960838.0.0.42b6305eAqJy2n ) to start the trial! Everyone is also welcome to join the Q&A and exchange group for exchanges. See you in the next class.
Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。