Abstract: Superior Scheduler is a scheduling engine specially designed for the Hadoop YARN distributed resource management system. It is a high-performance enterprise-level scheduler designed for the business requirements of enterprise customer integration resource pools and multi-tenancy.
This article is shared from the Huawei Cloud Community " FusionInsight MRS Self-developed Super Scheduler Principle Introduction ", author: a walnut.
Superior Scheduler is a scheduling engine specially designed for the Hadoop YARN distributed resource management system. It is a high-performance enterprise-level scheduler designed for the business requirements of enterprise customers' integration of resource pools and multi-tenancy.
Superior Scheduler can implement all functions of open source scheduler, Fair Scheduler and Capacity Scheduler. In addition, compared to open source schedulers, Superior Scheduler has made targeted enhancements in enterprise-level multi-tenant scheduling strategies, multi-user resource isolation and sharing within tenants, scheduling performance, system resource utilization, and support for large clusters. The design goal is to let Superior Scheduler directly replace the open source scheduler.
Similar to the open source Fair Scheduler and Capacity Scheduler, Superior Scheduler interacts with the YARN Resource Manager component through the YARN scheduler plug-in interface to provide resource scheduling functions. The following figure shows the overall system architecture:
The main modules of Superior Scheduler are as follows:
• Superior Scheduler Engine: A high-performance scheduler engine with rich scheduling strategies.
• Superior YARN Scheduler Plugin: The bridge between YARN Resource Manager and Superior Scheduler Engine, responsible for interacting with YARN Resource Manager.
In terms of scheduling principles, open source schedulers are all based on the scheduling mechanism of computing node heartbeat-driven resource reverse matching jobs. Specifically, each computing node periodically sends a heartbeat to the Resource Manager of YARN to notify the node status and at the same time starts the scheduler to allocate jobs for this node. This scheduling mechanism combines the scheduling cycle with the heartbeat. When the cluster size increases, it will encounter system scalability and scheduling performance bottlenecks. In addition, because of the adoption of the resource reverse matching job scheduling mechanism, the open source scheduler also has limitations in scheduling accuracy, such as the data affinity is random, and the system cannot support load-based scheduling strategies. The main reason is that the scheduler lacks a global resource view when selecting jobs, and it is difficult to make the best choice.
Different scheduling mechanisms are used inside Superior Scheduler. The scheduler of Superior Scheduler introduces a special scheduling thread to separate the scheduling concentricity, avoiding the system heartbeat storm problem. In addition, the Superior Scheduler scheduling process adopts a forward matching method from job to resource, so that each scheduled job has a global resource view, which can greatly mention the scheduling accuracy. Compared with open source schedulers, Superior Scheduler has greatly improved system throughput, utilization, and data affinity.
Superior Scheduler performance comparison
In addition to improving system throughput and utilization, Superior Scheduler also provides the following main scheduling functions:
• Multiple resource pools
Multiple resource pools help to logically divide cluster resources and share them among multiple tenants/queues. The division of resource pools can be based on heterogeneous resources or completely according to the demands of application resource isolation. For a resource pool, different queues can be configured with further strategies.
• Multi-tenant scheduling for each resource pool (reserve, min, share, max)
Superior Scheduler provides a flexible hierarchical multi-tenant scheduling strategy. It also allows configuring different policies for tenants/queues that can be accessed by different resource pools, as shown below.
The diagram of tenant resource allocation strategy is shown in the figure:
Compared with the open source scheduler, Superior Scheduler also provides tenant-level percentage and absolute value mixing strategies, which can well adapt to various flexible enterprise-level tenant resource scheduling requirements. For example, users can provide the maximum absolute value of resource guarantees in the first-level tenants, so that the resources of the tenants will not be affected by changes in the scale of the cluster. However, among the sub-tenants in the lower layer, a percentage allocation strategy can be provided, so that the resource utilization rate in the first-level tenant can be improved as much as possible.
• Heterogeneous and multi-dimensional resource scheduling
Superior Scheduler supports the scheduling of CPU and memory resources, and also supports expansion to support the following functions:
o Node tags can be used to identify the multi-dimensional attributes of nodes like GPU_ENABLED, SSD_ENBALED, etc., and can be scheduled based on these tags.
o Resource pools can be used to group resources of the same category and assign them to specific tenants/queues.
• Multi-user fair scheduling within tenants
In a leaf tenant, multiple users can use the same queue to submit jobs. Compared with an open source scheduler, Superior Scheduler can support flexible configuration of resource sharing strategies for different users in the same tenant. For example, you can configure more resource access weights for VIP users.
• Data location-aware scheduling
Superior Scheduler adopts the "job-to-node scheduling strategy", that is, it tries to schedule a given job among the available nodes so that the selected node is suitable for the given job. By doing this, the scheduler will have a holistic view of the cluster and data. If there is an opportunity to bring the task closer to the data, localization is guaranteed. The open source scheduler uses a "node-to-job scheduling strategy" to try to match an appropriate job in a given node.
• Dynamic resource reservation during Container scheduling
In a heterogeneous and diversified computing environment, some containers require more resources or multiple resources. For example, Spark jobs may require more memory. When these containers compete with other containers that require smaller resources, they may not have the opportunity to obtain the required resources within a reasonable time and are hungry. Since the open source scheduler is based on the scheduling method of resource reverse matching jobs, it will blindly reserve resources for these jobs to prevent starvation. This leads to an overall waste of system resources. The difference between Superior Scheduler and open source features is:
o Demand-based matching: Since the Superior Scheduler uses "job-to-node scheduling", it can select appropriate nodes to reserve resources to increase the startup time of these special containers and avoid waste.
o Tenant rebalancing: When the reservation logic is enabled, the open source scheduler does not follow the configured sharing strategy. Superior Scheduler takes a different approach. In each scheduling cycle, the Superior Scheduler will traverse the tenants and try to rebalance based on the multi-tenant strategy, and try to satisfy all strategies (reserve, min, share, etc.) so that the reserved resources can be released and the available resources can flow to different directions Other containers under the tenant should have received resources.
• Dynamic queue status control (Open/Closed/Active/Inactive)
Supports multiple queue states, which helps administrators operate and maintain multiple tenants.
o Open state (Open/Closed): If it is in the Open (default) state, applications submitted to this queue will be accepted; if it is in the Closed state, no applications will be accepted.
o Active state (Active/Inactive): If it is in Active (default) state, applications in the tenant can be scheduled and allocated resources. If it is in the Inactive state, no scheduling will be performed.
• Application waiting reason
If the application has not been started, information about the reason for the job waiting is provided.
The comparative analysis of Superior Scheduler and YARN open source scheduler is as follows:
Click to follow and learn about Huawei Cloud's fresh technology for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。