Joint practice of Rancher and Zhihu ultra-large-scale multi-cluster management

originates from

Zhihu is a high-quality question-and-answer community on the Chinese Internet. Every day, tens of millions of users share knowledge, experience and insights on Zhihu and find their own answers. In order to meet the needs of business development at different stages, the Zhihu container platform is also constantly evolving and improving. At present, almost all businesses run on containers.

In the past two years, Zhihu began to use Rancher to manage Kubernetes clusters, and the scale of the cluster gradually reached nearly 10,000 nodes. This article will introduce how Rancher performs performance tuning for large-scale clusters, and the final access speed is increased by 75%, reaching a state where the page access experience is available.

The reasons for choosing Rancher as our container management platform are as follows:

Our business is deployed on many domestic public cloud Kubernetes, and we need a unified platform to manage these Kubernetes clusters, and Rancher is very compatible with domestic public cloud Kubernetes platforms.
Rancher lowers the threshold for deploying and using Kubernetes clusters, and we can easily manage and use each Kubernetes cluster with the help of Rancher UI. R&D students who do not have a deep understanding of Kubernetes can also easily create resources such as Deployment, Pod, and PV.
We can use Rancher Pipeline to implement CI/CD internally, so that R&D can focus on the development of business applications. Although the Rancher team has informed Pipeline that it is no longer maintained, its simplicity is still in line with our internal positioning of CI/CD.
Rancher's continuous innovation capability, as well as a series of ecological expansion and layout around Kubernetes, such as: lightweight Kubernetes distribution k3s, Kubernetes' lightweight distributed block storage system Longhorn, Kubernetes-based open source hyperconverged foundation Infrastructure (HCI) Harvester, and Rancher's next-generation Kubernetes distribution, RKE2. Following the leading innovative manufacturers is also very beneficial to the continuous progress of the team.
As an international container manufacturer, Rancher has a very professional R&D team in China, and communication is very convenient. Many questions can be answered in the Rancher Chinese community. For an open source and free platform, it can be said to be very conscientious.

At first, when we used Rancher to manage small and medium-sized clusters, RancherUI provided almost all the functions we needed, and the UI was very smooth.

However, with the increase in business volume, the scale of the cluster continues to expand. When we expand to use one Rancher to manage nearly ten clusters, nearly 10,000 nodes and hundreds of thousands of pods (the maximum scale of a single cluster is nearly thousands of nodes and dozens of pods). Tens of thousands of pods) , the Rancher UI will frequently freeze and take too long to load. Some pages take more than 20 seconds to load. In severe cases, it may not be possible to connect to the downstream cluster, and the UI prompts "The current cluster is Unavailable. The function of directly interacting with the API is unavailable until the API is ready."

Yes, we ran into performance issues with Rancher. Especially for super administrator users, a large amount of data needs to be loaded when logging in. Basically, the UI is in an unavailable state, and downstream clusters are frequently disconnected.

Dawn

From the above phenomenon, it seems that using Rancher to manage ultra-large-scale clusters has encountered performance problems, affecting the experience. Then I sought the help of Rancher's local technical team. After several efficient online meetings and communication, I basically found the root cause:

Although we have a small number of clusters, the total number of all cluster nodes is not small. When the page is loaded, it relies on the node list interface to obtain data (this node is a special CRD created by Rancher, which is related to the actual number of nodes in the downstream cluster). This interface takes a long time to process, causing the page to load slowly.
Our downstream clusters are mainly Kubernetes in the public cloud, and these clusters are managed into Rancher through import. In this managed mode, Rancher Server accesses the Kube API of the downstream cluster through the Tunnel established with the cluster-agent, but this is not a direct access, but the final load from the Tunnel to the Kubernetes Service IP (such as 10.43.0.1:443) to multiple Kube-api servers. Usually, there is no problem with this mode, but due to the large amount of visits and data, the SVC IP forwarding capability cannot be supported, which seriously affects the communication efficiency.

It is understood that if you want to solve this problem through the community edition, the operation will be quite complicated. However, the enterprise version of Rancher already has a set of mature solutions and strategies for performance optimization. The following is the difference between the enterprise version and the community version of Rancher introduced by Rancher engineers:

The main difference between the enterprise version and the community version of Rancher for query resources is that for some slow query APIs, the community version reads data through the Kubernetes API, and the enterprise version reads data through the Cache. At the same time, it supports a variety of connection strategies for downstream clusters, thereby improving the management efficiency of downstream clusters. In addition, the enterprise version also has certain enhancements to infrastructure capabilities such as monitoring/logging/GPU/networking, and will respond faster to local business customers' bugs.
Due to the needs of national conditions, the Enterprise Edition is a special existence. Overseas customers can basically only subscribe to the open source version, while local customers can additionally enjoy the features of the enterprise version, which is developed by a completely local R&D team. Adhering to the open concept of SUSE Rancher, users can come and go freely, and switch between the enterprise version and the open source version at will.

In view of the above analysis, after weighing, we decided to temporarily use the enterprise version for cluster tuning practice.

choose

switch to Enterprise Edition

First of all, we switched from the community version of Rancher to the enterprise version. The iteration of the enterprise version is more stable, and the release strategy will not strictly follow the open source version. Fortunately, the community edition we use has a corresponding enterprise edition, and supports a smooth switch from the community edition to the enterprise edition. Basically, it is a lossless switch, and it is very convenient to directly replace the mirror image.

Optimize downstream cluster parameters

Rancher engineers recommended some parameter optimization schemes for downstream Kubernetes clusters, but we do not use a lot of custom RKE clusters and mainly focus on public cloud Kubernetes. The optimization of parameters of such downstream cluster components is related to the actual environment, only listed here Here are some of the more commonly used kube-apiserver parameters for reference:

kernel tuning

The Rancher team also provided us with some mature tuning parameters in the open source community: kops/sysctls.go at master kubernetes/kops GitHub ( https://github.com/kubernetes/kops/blob/master/nodeup/ pkg/model/sysctls.go )

Enable resource cache

After the resource cache is enabled, some interfaces involved in reading local cluster data will use the Cache mode, which greatly improves the performance of API list-all, and has obvious effects for scenarios with a large number of nodes in our environment. The resources currently added to the cache are as follows:

Cluster connection mode

The Enterprise Edition is optimized for connecting to downstream clusters, and supports multiple ways to connect to downstream clusters. Common connection strategies for Enterprise Edition users include:

Strategy 1: Default Configuration
The default strategy is to use the community version to connect to downstream clusters without modifying the connection method.
The default timeout value of k8srest client in Tunnel in Enterprise Edition is 60s, which can effectively reduce the failure probability of downstream cluster data query during heavy load. The performance improvement is limited, but the request success rate will be greatly improved.
Strategy 2: Optimize Tunnel Links
By default, the downstream cluster's Kubernetes API Service (eg 10.43.0.1) is used for communication via Tunnel. If there is a Kubernetes API LB with better performance outside the self-built cluster, you can change the apiEndpoint connecting the downstream cluster to the LB entry address, so that the pressure can be shared through the Kubernetes API LB and the speed of connecting to the downstream cluster can be improved.
Strategy 3: Direct and Tunnel Dual Links
Before enabling direct connection and tunnel dual-link mode, you need to ensure that the downstream cluster has an apiEndpoint that is directly reachable from the Rancher Server network (refer to Strategy 2). After optimization, the v3 interface requested by the downstream cluster in the Rancher API is directly connected, and the rest is still through the tunnel, which can effectively disperse the traffic and avoid a large amount of data being transmitted in the tunnel. Compared with the second strategy, the performance is further improved, but it is more dependent on the basic network planning.

In the end, we chose "Strategy 2" to connect downstream clusters because we have a powerful Kubernetes API LB. When the link connecting the downstream cluster is switched, the downstream cluster may become unreachable for a short time, but the business operation of the downstream cluster will not be affected.

First of all, the Rancher Enterprise Edition UI is very convenient for our team. The left navigation bar can easily find various functions, which is suitable for the usage habits of Chinese people. It may be that we are engaged in infrastructure management, and we can say that we like this minimalist UI style very much.

Secondly, after the optimization of the above strategy, the Timeout of the Dial Kubernetes API has been improved, and in the end it is almost impossible to see the request failure caused by the timeout. In addition, using the LB Endpoint of the Kube api-server of the downstream cluster as the request target, the disconnection phenomenon of the downstream cluster has disappeared, and it can be said that the village road has been replaced by a highway. **In addition, some interfaces are supported to quickly retrieve through cache, especially for Node resources, the response time is shortened from 20+ seconds to less than 5 seconds. Other more important interfaces were also compared, and the average speed was increased by about 75%.

In Rancher Enterprise Edition, users can greatly optimize cluster performance by optimizing the parameters of downstream clusters, setting policies for connecting downstream clusters, and enabling caching, so as to easily manage large-scale clusters. As can be seen from the above practice, as long as the tuning and planning are done reasonably, even a super-large cluster like Zhihu can have the same experience as a small-scale cluster.

At the time of writing this article, the local Rancher team is doing secondary tuning for the performance of the enterprise version. It is said that it can manage the 10,000-level workload in a single Project/NS of from a UI perspective, which basically fully covers our usage limit. We look forward to working with Rancher again and sharing new performance practices with you.

If you also have usage scenarios or requirements for managing large-scale clusters, you can contact SUSE Rancher through the Chinese official website ( https://www.rancher.cn ) to escort your cluster!

Joint practice of Rancher and Zhihu ultra-large-scale multi-cluster management

originates from

Dawn

choose

switch to Enterprise Edition

Optimize downstream cluster parameters

kernel tuning

Enable resource cache

Cluster connection mode

Rancher

引用和评论

实战指南：使用 kube-prometheus-stack 监控 K3s 集群

Jenkins 企业级 CI/CD 实践：安装、配置与 Kubernetes & Docker 集成

k8s集群部署（一主两从）

k8s实战基础

使用kubeadm部署高可用IPV4/IPV6集群---V1.32

centos7使用yum网络安装

基于k3s部署Nginx、MySQL、PHP和Redis的详细教程