Author | Zhao Mingshan (
Preface
OpenKruise is Alibaba Cloud's open source cloud native application automation management suite, and it is also a Sandbox project currently hosted under the Cloud Native Computing Foundation (CNCF). It comes from the accumulation of Alibaba’s containerization and cloud-native technology for many years. It is a standard extension component based on Kubernetes for large-scale applications in Alibaba’s internal production environment. Best practice.
OpenKruise released the latest v0.10.0 version in 2021.9.6 with new capabilities such as elastic topology management and application security protection. This article will reveal how OpenKruise realizes application availability protection capabilities.
background
At the beginning of the article, I want to talk about what exactly is "application usability protection". For example, the ETCD service deployed on Kubernetes must always ensure that the number of available instances is not less than N (limited by the election mechanism of the raft protocol). Many students will think that maxUnavailable can be set in Deployment, isn't that enough? Besides, will there be RS Controller doing replica control? Think about it carefully, Deployment’s MaxUnavailable guarantees the minimum number of Pods during the rolling release of the application, while the RS Controller controller makes the actual number of copies of the application equal to the expected number of copies as soon as possible, and does not guarantee that the application will be available at all times. The minimum number of copies available.
In response to the above scenarios, the PodDisruptionBudget (PDB) natively provided by Kubernetes ensures high availability of applications by limiting the number of simultaneous interruption Pods. However, the current PDB's capabilities are not comprehensive. It can only protect against Pod Eviction scenarios (for example: kubectl drain node expels Pods on the node). In the following “concurrent Pod update/eviction/delete” scenario, even with PDB protection, it will still cause business interruption and service degradation:
- The application owner is undergoing a version upgrade through Deployment, and at the same time, the cluster administrator is shrinking the node due to the low utilization of machine resources.
- The middleware team is using SidecarSet to upgrade the sidecar version in the cluster (for example: ServiceMesh envoy) in situ, while HPA is scaling down the same batch of applications
- The application owner and middleware team utilize the ability of CloneSet and SidecarSet to upgrade in situ, and are upgrading the same batch of Pods
PodUnavailableBudget improves the high availability of applications
Why can Kubernetes native PDB only protect against Pod Eviction scenarios? First, let's take a look at its implementation principle: PDB selects a list of protected Pods through the selector, minAvailable indicates the minimum number of Pods available, and pdb-controller calculates based on the previous two values and the ready status of the online Pod The maximum number of Pods allowed to be interrupted at the current moment is PodDisruptionAllowd. The k8s pod evictionRestful API interface will determine whether the interface succeeds or returns a 400 error according to pdb PodDisruptionAllowd. At this point, the truth is finally revealed. pdb implements pod protection through the evictionRestful API interface, so it can only be applied to Pod Eviction scenarios.
The overall idea of OpenKruise PodUnavailableBudget (PUB) security protection is roughly the same as that of PDB, but some adjustments have been made to the key protection path. Voluntary Disruption (such as: cluster administrator expelling Node, concurrently upgrading Pod) actively causes Pod unavailability scenarios can actually be summarized into the following three categories:
- Modification Pod.Spec definition (CloneSet, SidecarSet in-situ upgrade container)
- Delete Pod (Deployment and other controllers roll to upgrade Pod, directly delete Pod)
- Eviction API (kubectl drain node to expel Pod)
OpenKruise adds webhook logic for Pod Update/Delete/Eviction based on the Kubernetes Adminssion Webhook mechanism, and implements a more comprehensive application Pod security protection mechanism based on pub’s PodUnavailableAllowed. The logical architecture is as follows:
Application scenario
- Stateless applications: For example, if you want to have at least 60% of copies Available
- Solution: Create a PUB Object, specify minAvailable as 60%, or maxUnavailable as 40%
apiVersion: apps.kruise.io/v1alpha1
kind: PodUnavailableBudget
metadata:
name: web-server-pub
namespace: web
spec:
targetRef:
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
name: web-server
minAvailable: 60%
- Stateful applications: the minimum number of available instances cannot be less than a certain number N (for example, limited to the election mechanism of raft protocol applications)
- Solution: Set maxUnavailable=1 or minAvailable=N to allow only one instance to be deleted at a time or to delete workload.replicas-minAvailable instances at a time, respectively
apiVersion: apps.kruise.io/v1alpha1
kind: PodUnavailableBudget
metadata:
name: etcd-pub
namespace: etcd
spec:
targetRef:
apiVersion: apps.kruise.io/v1alpha1
kind: StatefulSet
name: etcd
maxUnavailable: 1
- Single-instance application: Before terminating this instance, the customer must be notified and agreed in advance
- Solution: Create a PUB Object and set maxUnavailable to 0, so OpenKruise will prevent the deletion of this instance, and then notify and ask the user for consent, then delete the PUB to remove the block, and then go to recreate
apiVersion: apps.kruise.io/v1alpha1
kind: PodUnavailableBudget
metadata:
name: gameserver-pub
namespace: game
spec:
targetRef:
apiVersion: apps.kruise.io/v1alpha1
kind: StatefulSet
name: gameserver
maxUnavailable: 0
Summarize
While Kubernetes brings extremely flexible scheduling to users, it also brings a certain test to the high availability of applications. PUB is an attempt in the Voluntary Disruption scenario. At the same time, it cooperates with the ability to prevent cascading deletion provided by OpenKruise. To improve the stability of online applications. For more tricky InVoluntary Disruption (such as: cluster Node network split brain, vk node abnormality) that leads to Pod unavailability and other scenarios that reduce application availability, OpenKruise will also have more explorations in the future. At the same time, we also welcome more students to participate in the OpenKruise community to jointly build a richer and more comprehensive K8s application management, delivery and expansion capabilities, which can be oriented to more large-scale, complex, and extreme performance scenarios.
Github:https://github.com/openkruise/kruise
Official:https://openkruise.io/
Slack: Channel in Kubernetes Slack
Dingding exchange group: search group number [23330762] can be added~
link (161690ad8ee3ec https://github.com/openkruise/kruise) to view the github homepage of the !
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。