introduction of , the unified scheduling that debuted on a large scale for the first time on Double 11 this year, through a set of scheduling protocols and a set of system architecture, unified management of the underlying computing, storage, and network resources, ultra-large-scale, high-efficiency, and automated resource elasticity has been realized. A new breakthrough in the industry. In offline mixing department, off-line mixing department, new fast up and down technology, reduce the purchase of tens of thousands of servers, bring hundreds of millions of resource cost optimization and promote efficiency improvement.
01 background
The unified scheduling project 1.0 successfully supported the 2021 Double 11 big promotion, and the unified scheduling plan realized a comprehensive upgrade and optimization of the entire process from container scheduling to fast uploading and fast downloading. More than 100 core members of the project team have successfully passed through the various stages of project approval, POC, program review and design, closed development testing, and big promotion sprint, and successfully went online after testing.
As a core project of Alibaba, Alibaba Cloud (container team and big data team), together with the Alibaba resource efficiency team and the ant container orchestration team, took more than a year of research and development and technical breakthroughs, and realized the "mixed technology" Comprehensive upgrade of "Unified Scheduling Technology".
Today, unified scheduling has realized the overall unification of Alibaba e-commerce, search promotion, MaxCompute big data and ant business scheduling, realized the unification of pod scheduling and task high-performance scheduling, and realized the unification of resource views and scheduling coordination. The mixing of multiple complex business forms and the improvement of utilization rate, fully supports the large-scale resource scheduling of dozens of data centers, millions of containers, and tens of millions of cores around the world.
Cloud Native Product Family
02 Comprehensive upgrade of unified scheduling technology
The essence of cloud computing is to turn small computing fragments into a larger resource pool, fully cut peaks and fill valleys, and provide the ultimate energy efficiency ratio. With the pursuit of low-carbon, energy-saving, green environmental protection, technological development, and more efficient operation of data centers, Alibaba will never stop exploring technology. Ali’s technical people have an ideal, to make the computing power of the data center become the same infrastructure as water, electricity, and gas, out of the box.
In order to maximize the advantages of complementary peaks and valleys between businesses, in the past, we built hybrid technology to break the fragmentation of multiple resource pools. Multi-scheduling brains in different computing fields coordinated and shared resources; the old generation of hybrid technology brought resources Unification and utilization are greatly improved, but the nature of multiple schedulers restricts our pursuit.
Alibaba continues to pursue the construction of a new generation of scheduling technology that can support more complex tasks without distinction, extreme flexibility and complementarity, and achieve the ultimate global optimal scheduling and provide higher-quality computing power. This year we reached a new critical point in technology. Container Service ACK took the lead and collaborated with many teams to launch a new generation of unified scheduling project based on ACK.
Container Product Family
The unified scheduling that made its debut on Double 11 this year, through a set of scheduling protocols and a set of system architecture, unified management of the underlying computing, storage, and network resources, super-large-scale, high-efficiency, and automated resource elasticity, achieving new breakthroughs in the industry . In the offline mixing department, off-line mixing department, and the new fast up and down technology, reduced the purchase of tens of thousands of servers, which has brought hundreds of millions of resource cost optimization and greatly promoted efficiency improvement.
Large-scale data intelligence was introduced for the first time this year to further enrich the scheduling capabilities, providing real-time load perception, automatic specification recommendation (VPA), differentiated SLO workload scheduling, CPU normalization, HPA that supports periodic prediction, and time-sharing recovery It provides more dimensional cost optimization technology and highly reliable container runtime guarantee.
Around the new generation of unified scheduling, Alibaba e-commerce, search, big data and many other platforms, and different types of complex computing resources all apply for resources in a consistent manner, coordinated quota management and resource planning, and hundreds of thousands of nuclear resources can be borrowed in seconds. Level to complete. Based on unified scheduling, Alibaba Cloud and Ant have also achieved the integration of scheduling technology, and the ant ecosystem has been fully upgraded to unified scheduling. The scheduling platform brings more room for imagination in the future. For example, we can use many means, such as price levers and other economic factors, to drive Ali’s internal business to use the resources of each data center more reasonably, and to ensure that the global resource level of the data center is as balanced as possible. To improve the energy efficiency ratio of the data center.
Alibaba Cloud Container Service ACK has further enhanced the standard Kubernetes, with higher performance throughput and lower response delay, to build a stable and reliable ultra-large-scale single-cluster capability, and steadily support the ultra-large-scale cluster with 12,000 nodes and more than 1 million cores. The production and operation of resource pooling provides a solid foundation. Alibaba's many types of complex resources have also achieved a comprehensive integration and upgrade based on the container service base ACK.
In addition to classic Ali scenarios such as e-commerce, search, and big data, unified scheduling has also greatly empowered new technological innovations. Taking the live e-commerce scenario as an example, decision-making requires high real-time computing. For example, the second-level data analysis of real-time data such as browsing and transactions generated by the more than 90 million online viewers in the Wei Ya double 11 live broadcast room. This year, Ali upgraded the real-time computing engine Blink to a new generation engine based on unified scheduling, which has greatly improved the cost, performance, stability and user experience. The large-scale job pull performance is 40% , error recovery efficiency by 100% 161a5e110a0125, through unified scheduling technology to save hundreds of thousands of CPUs in the double 11 big promotion, achieves a global zero hot spot when the cluster CPU water level exceeds 65%, ensuring the timeliness of each live streaming.
In terms of serverless, function services have been implemented on a large scale within the group for the first time, and they have been applied to Double 11 to support more than 10 business scenarios such as Taobao search recommendation, data processing, and front-end SSR. With the help of unified scheduling technology, function computing can achieve large-scale mixed running with Ali resource pools, make full use of the fragmented resources of the cluster, and completely solve the problem of resource idle costs in the serverless scenario during low traffic peaks. Based on the ACK image on-demand loading and network stack optimization, function instance is less than 150ms, and the pooling technology ensures that the cold start rate of the function computing container is less than 5% , which is the key to the success of the Double 11 promotion .
03 Future Outlook
In the future, the container service ACK will export Alibaba’s unified scheduling experience to the entire industry, support more new computing load ecology, and the evolution of the architecture of new technology forms, realize the ubiquity of cloud computing, fully empower more enterprises, and release greater Of low-carbon value dividends.
Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。