About Recently, USENIX ATC, the top international academic conference on computer system architecture, was held online. ATC was founded in 1992 and is a top-level conference in the field of computer systems organized by USENIX. It has been successfully held for 31 sessions so far. A series of influential researches in the field of computer systems such as Oak language (the predecessor of JAVA language), QEMU, ZooKeeper, etc. The results are all published or announced in USENIXATC. ATC has extremely high requirements for papers, and must meet the requirements of basic contribution, forward-looking influence and solid system implementation. The acceptance rate of this paper is only 18%, and only 3 best papers are selected worldwide.

ATC2021 was released, and the acceptance rate reached a new low at 18%. At the same time, 3 best papers were published, and the paper on Feitian operating system submitted by Alibaba Cloud occupied a seat, setting the best result of a Chinese company.

Recently, USENIX ATC, the top international academic conference on computer system architecture, was held online. ATC was founded in 1992 and is a top conference in the field of computer systems organized by USENIX. It has been successfully held for 31 sessions so far. A series of influential researches in the field of computer systems such as Oak language (the predecessor of JAVA language), QEMU, ZooKeeper, etc. The results are all published or announced in USENIXATC. ATC has extremely high requirements for papers, and must meet the requirements of basic contribution, forward-looking influence and solid system implementation. The acceptance rate of this paper is only 18%, and only 3 best papers are selected worldwide.

1627550040793.png

The paper submitted by Alibaba Cloud is titled "Scaling Large Production Clusters with Partitioned Synchronization" (PDF version), which discusses how Feitian solves the scheduling problem of large-scale computing resources. It was included and won the Best Paper Award. This is the first time that ATC best paper appears. Figures of Chinese companies.

Feitian is a super-large-scale cloud computing operating system developed by Alibaba Cloud. It can connect millions of servers all over the world into a supercomputer to provide computing power to the society in the form of online public services. Feitian’s core services include distributed computing, storage, database, network, etc. The award-winning paper is one of the resource scheduling services.

It is reported that the Feitian Distributed Scheduling System "fuxi2.0" submitted by Alibaba Cloud is the result of a joint project between the Alibaba Academic Cooperative Innovation Research Project (AIR) and Mr. Jamescheng of the Chinese University of Hong Kong. This paper discusses the problems of serious resource conflicts and poor scheduling performance in the industry's distributed scheduling architecture, and creatively proposes a set of resource conflict resolution mechanisms to achieve the scalability of the scheduler on the cluster scale while ensuring excellent scheduling performance And the scheduling effect supports the scale of 100,000 nodes in a single cluster of Feitian big data platform MaxCompute, and the concurrency capacity of 40,000 jobs/sec.

The core problem of cloud computing is how to efficiently organize thousands or even larger machines, and flexibly perform task scheduling and management, so that users can use cloud computing like a machine. With the increasing amount of data and calculations, cloud computing scenarios have become ultra-large-scale. Previously, traditional schedulers based on central architecture were limited by single-point processing capabilities and were unable to achieve scalability in scale.

Guan Tao, a researcher in the Alibaba Cloud Computing Platform Division, said: “There is a saying in the field of distributed systems that every time the scale expands by an order of magnitude, it becomes a completely new problem. Scale, utilization, and fairness are the three cores of the scheduling system. This paper is based on part of the work of the Alibaba Cloud Feitian system, and explores the scalability of the scheduling system in super-large scale without loss of utilization and fairness."

1627550055114.png

In recent years, many research results of Feitian operating system have been accepted by the top international conferences: in 2019, the data scheduling paper Yugong was accepted by the top database conference VLDB; in 2020, the machine learning & stand-alone scheduling paper AntMan was accepted by the top operating system conference OSDI; 2021 In 2004, Fangorn, a computing scheduling paper, was accepted by the top conference VLDB in the database.

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。