After 7 years of actual Double 11, how did Alibaba define the cloud-native hybrid scheduling priority and service quality?

Author: Nanyi

introduction

Alibaba’s offline hybrid technology started in 2014 and has undergone seven years of double eleven inspections. It has been promoted on a large scale internally, saving Alibaba Group billions in resource costs every year, and the overall resource utilization rate has reached about 70%. Reach the industry leader. In the past two years, we have begun to export the mixed department technology within the group to the industry through productization, and seamlessly install it on the standard native K8s cluster through plug-in methods, and cooperate with the mixed department management and operation and maintenance capabilities to enhance the cluster’s Resource utilization and integrated user experience of the product.

Since the hybrid is a complex technology and operation and maintenance system, including K8s scheduling, OS isolation, observability, and other technologies, this article will focus on the container priority and service quality model of the K8s layer, hoping to provide the industry with some Ideas that can be used for reference.

K8s native model

In actual production practice, even many technicians who are familiar with cloud native and K8s often confuse scheduling priority (Priority) and quality of service (QoS).

Therefore, before talking about the model of the mixed department, we first give a detailed introduction to the original concept of K8s, as shown in the following table:

For a detailed description from the API level, you can see the following table

Problems to be solved by the mixed department

The main problem that the hybrid department solves is to make full use of the idle resources in the cluster to improve the overall utilization of the cluster under the premise of ensuring the service level target SLO of the deployed application.

After a cluster is deployed by the online service deployment, due to the high-guarantee characteristics of the online application, a peak resource specification will be given to the container, which may cause the actual real utilization rate to be very low.

We hope that this part of the idle but unused resources will be oversold and used for offline operations with low SLO, so as to increase the overall machine water level. In this way, it is necessary to provide SLO-based scheduling capabilities, and to take into account the real resource level of the machine for scheduling to avoid the occurrence of hot spots.

In addition, since online SLO is usually higher and offline SLO is lower, when the overall water level of the machine rises too high, offline operations can be preempted to ensure the SLO of online applications. And the need to utilize the isolation features of the kernel-level cgroup to ensure high SLO and low SLO operations.

Then, between these online and offline Pods, we need to use different scheduling priorities and service quality levels to meet the actual operating needs of online and offline.

Application level model defined by cloud native hybrid

First, please take a look at how the yaml of a Pod in the mixed department is defined

apiVersion: v1
kind: Pod
metadata:
  annotations: 
    alibabacloud.com/qosClass: BE # {LSR,LS,BE}
  labels:
    alibabacloud.com/qos: BE  # {LSR,LS,BE} 
spec:
  containers:
  - resources:
      limits:
        alibabacloud.com/reclaimed-cpu: 1000  # 单位  milli core，1000表示1Core
        alibabacloud.com/reclaimed-memory: 2048  # 单位 字节，和普通内存一样。单位可以为 Gi Mi Ki GB MB KB
      requests:
        alibabacloud.com/reclaimed-cpu: 1000
        alibabacloud.com/reclaimed-memory: 2048

This is the Pod level we introduced in the mixed department. The difference from the community is that we explicitly declare 3 levels in the annotation and label: LSR, LS, BE. These three levels will be associated with scheduling priority (Priority) and quality of service (Qos) at the same time.

For the specific resource usage of each container, LSR and LS still use the original cpu/memory configuration method. BE tasks are more special, and the resources are declared through the community standard extended-resource mode.

So, what are the runtime meanings of these three levels? You can refer to this picture to see the runtime conditions of these three types of applications on the CPU

And the detailed impact on the use of other resources:

It can be seen that this level is not only related to the CPU and memory of the Pod running on a single machine, but also related to the full link priority of the network QoS, to prevent low-quality offline tasks from preempting all network bandwidth. Alibaba’s work on the kernel has effectively ensured the stability of runtime applications. During Double 11 in 2021, Alibaba became the first large-scale technology company in the world to put all its business on its own public cloud. This means that Alibaba Cloud has The ability to deal with technical challenges in difficult and complex environments has also brought great technical benefits: Alibaba’s business R&D efficiency has increased by 20%, CPU resource utilization has increased by 30%, applications have been 100% cloud-native, and online business containers The scale can reach one million, and the computing efficiency has been greatly improved. The overall computing cost of Double 11 has been reduced by 30% in three years. In this process, hybrid deployment technology played an important role. The kernel team and cloud native team engineers stepped on countless pits, and deposited many advanced features including flexible CPU bandwidth, Group Identity, SMT expeller, memcg asynchronous recycling, memory waterline classification, memcg OOM, etc., which are at the leading level in the industry. These tasks will be introduced one by one in the series of articles.

When these three types of priority tasks actually occur during scheduling and running, as shown in the following table

That is to say, the priority of the mixing department will be applied to both scheduling and runtime, to ensure that the high-quality and medium-quality tasks with high SLO use the resources in the cluster to the greatest extent.

Quota, water mark, multi-tenant isolation

This article only focuses on the scheduling priority of K8s single Pod. In actual use, in order to ensure the SLO of the application, it needs to be used in conjunction with the water mark of the single machine, the quota of the tenant, and the OS isolation capability, etc., we will detail in the follow-up article Explore.