Introduction to has been more than a month since the last article that declared OCM open source and submitted the application for CNCF TOC project incubation. Over the past month or so, the OCM community has attracted more and more attention and the participation of partners. OCM's own research and development, and the application of products and solutions in related fields have also achieved rapid development. If you are still hesitant about the technology selection of multi-cluster management in the cloud native environment, I hope this article can answer your questions. Now is the time to participate in the OCM community.
Author | Qiu Jian Feng Yong
Foreword
It has been more than a month since the last article announcing that OCM was open sourced and submitted to the CNCF TOC project incubation application. Over the past month or so, the OCM community has attracted more and more attention and the participation of partners. OCM's own research and development, and the application of products and solutions in related fields have also achieved rapid development. If you are still hesitant about the technology selection of multi-cluster management in the cloud native environment, I hope this article can answer your questions. Now is the time to participate in the OCM community.
OCM community research and development progress
Open Cluster Management released version 0.4.0 in July, and the main newly added features include:
1. Plug-in registration and life cycle management
Through the ManagedClusterAddon API, users can define the management plug-in instructions to install the management plug-in on the managed cluster, and then implement scalable management capabilities through the management plug-in. For example, users can create a ManagedClusterAddon resource named submariner in the namespace corresponding to the ManagedCluster resource on the central cluster to activate the installation and configuration of the submariner in the managed cluster, and use submariner as a solution for multi-cluster network connectivity.
The registration and management of the cluster management plug-in is shown in the following figure:
Each plugin requires a controller running on the central cluster. This controller is developed by the plug-in developer. The main purpose is to tell OCM how the plug-in needs to be installed in the managed cluster, how to communicate with the central cluster, and the corresponding communication permissions. Through this information, OCM is responsible for the deployment of plug-ins in the managed cluster and the communication with the central cluster. In order to facilitate the development of OCM-based management components for developers, OCM 0.4.0 added the golang framework of addon-framework to help developers. Developers only need to implement the interface exposed by the framework to provide plug-in deployment and configuration information. At the same time, the framework also provides several help functions to facilitate developers to add resource processing logic on the Kubernetes platform. Developers can also implement the Operator of Kuberentes custom resources in the plug-in.
2. New Placement API for cluster load and arbitrary resource scheduling
OCM 0.4.0 introduces a new Placement API that describes how to select one or more clusters for deployment targets for multi-cluster loads or arbitrary custom resources of Kubernetes. Placement needs to select clusters in one or more specific ManagedClusterSet, and these clusters must be bound to the corresponding namespace of Placement through ManagedClusterSetBinding. In OCM 0.4.0, users can select clusters through the tabs on ManagedCluster or ClusterClaims. For example, users can select all clusters with the label "env=prod" to distribute configuration information in the following ways:
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: Placement
metadata:
name: placement1
namespace: ns1
spec:
predicates:
- requiredClusterSelector:
labelSelector:
matchLabels:
env: prod
You can also use the following Placement to select a specific number of clusters and a specific Kubernetes version to deploy applications.
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: Placement
metadata:
name: clusters-from-specific-version
namespace: ns1
spec:
numberOfClusters: 3
predicates:
- requiredClusterSelector:
claimSelector:
matchExpressions:
- key: kubeversion.open-cluster-management.io
operator: In
values:
- 1.18.0
- 1.18.1
The Placement component will generate PlacementDecision based on the current cluster information and API definition, as shown below
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: PlacementDecision
metadata:
labels:
cluster.open-cluster-management.io/placement: placement1
name: placement1-decision-1
namespace: ns1
status:
decisions:
- clusterName: cluster1
- clusterName: cluster2
- clusterName: cluster3
It should be noted that Placement only selects the cluster that meets the requirements based on the definition of the API, and does not care about how to distribute the configuration and deploy the application. Placement needs to work with other applications or configuration distribution components, including application management controllers, such as ArgoCD or KubeVela, to complete application scheduling and deployment. This is also a manifestation of the OCM "microkernel" design philosophy. The scheduling service is completely decoupled from the application or resource management service.
In the subsequent development process, we are planning to introduce more scheduling strategies, such as scheduling through cluster resource usage, taint/toleration mechanism, and non-compatibility. If there is a new scheduling strategy requirement, welcome to put forward requirements in the community sub-project.
3. Clusteradm command line tool (alpha version)
clusteradm is inspired by kubeadm, and its main purpose is to simplify the installation of OCM components and cluster registration. Users can deploy OCM management components on the central cluster by running the "clusteradm init" command. And use the "clusteradm join" command to deploy OCM local components in the managed cluster, and register the cluster to the central cluster. clusteradm is also constantly evolving, and more OCM management and monitoring sub-commands will be added in the future.
OCM Community Research and Development Project
OCM plans to release version 0.5.0 in October. The development focus of version 0.5.0 mainly includes the further improvement of the Placement API, the improvement of user experience and the new network communication sub-project.
1. More Placement API scheduling strategies
- Scheduling based on cluster assignable resources:
Select the cluster according to the allocatable resources in the cluster to realize the even distribution of resources in the cluster
- Load non-affinity and affinity:
Through the Placement API, it is possible to select cross-regional cluster deployment load.
- Cluster taint/toleration:
Introduce functions similar to node taint/toleration
2. Enhance the availability of clusteradm
Further enhance the function of clusteradm, mainly through the ability to deploy and manage OCM plug-ins through clusteradm.
3. New proxy plug-in project
Realize direct access to the Kubernetes API of the managed cluster, the services running in the managed cluster, and even the node sshd through the reverse tunnel. Allow the administrator or controller to directly access the managed cluster behind the firewall.
OCM cooperates with other projects
1、KubeVela
OCM is currently the core component of KubeVela multi-cluster management. Users can complete cluster creation, cluster registration, and final multi-cluster application deployment through KubeVela's application delivery workflow. This technical process satisfies the application's consistency requirements in different stages of development, testing, and production. Users can create a cluster environment with one-click through KubeVela, so that local development can be verified in the same mode as production deployment, ensuring software delivery The success rate and correctness. On the other hand, KubeVela and OCM have built a rich multi-cluster and multi-environment application delivery capability. On the basis of the cluster scheduling and resource distribution provided by OCM, users can use KubeVela to implement differentiated configurations of different environments for an application, making an application Deploying to different cluster environments can have different configuration attributes and operation and maintenance strategies.
2、ArgoCD
OCM is also continuing to cooperate with other open source projects to try to enhance the multi-cluster management capabilities of other projects through OCM. The OCM community and ArgoCD cooperate to promote the use of third-party APIs in ArgoCD for cluster scheduling and deploy applications through ArgoCD. Users can use ArgoCD's ApplicationSet in conjunction with the Placement API in OCM to implement the deployment of multi-cluster applications.
3、Clusternet
The OCM community also tried to cooperate with Clusternet. Through the plug-in management capabilities of OCM, clusternet can be loaded into OCM as a plug-in, so that it is convenient to use the functions of Clusternet and other OCM plug-ins at the same time.
OCM in products and solutions
1. Alibaba Cloud Proprietary Cloud Agile Edition Enterprise Container Platform
The enterprise container platform of the Alibaba Cloud Proprietary Cloud Agile Edition needs to manage different user clusters in different types of user scenarios, responsible for the deployment of these clusters, the installation of management services, resource planning, monitoring, compliance, and policy management , Load distribution and other work. With the continuous enrichment of functions, the initial self-developed architecture encountered problems such as scalability, ease of use, and R&D efficiency. Through research, it is found that OCM's "microkernel" design not only solves the above architectural problems very well, but is also very friendly to secondary development. R&D personnel can freely choose sub-projects of the OCM community and add different scenarios based on the selected atomic capabilities. Business logic.
The currently used and planned OCM sub-projects and related capabilities include:
- Cluster deployment and registration
The container platform supports management to independently create clusters, and users have existing clusters, edge clusters, and cloud service provider clusters. Depending on the type of cluster, the types of management services deployed on the managed cluster, the method of deployment, and the configuration of the cluster are different.
Using the registration and work sub-projects, the container platform can customize the automated process of cluster registration. According to the cluster attribute information collected and reported by the registration-agent, the automated registration process can determine the type of management service deployed and the cluster configuration, and deploy it in the managed cluster through the work-agent. In particular, when the cluster is running behind a firewall, the container platform will deploy proxy services to implement the reverse proxy of Kubernentes and management service APIs, and even proxy the ssh protocol to implement remote operation and maintenance by technical support personnel when needed.
The proxy service is being contributed to the OCM community in the form of OCM addon supported by the addon-framework sub-project, and it is planned to be released in version 0.5.0.
- Cluster service management
As mentioned above, the management services deployed in different cluster types are different. How to manage these services in an orderly manner and clearly demonstrate the management capabilities that the managed cluster can provide is a problem that needs to be solved. Using the addon-framework sub-project, the container platform can implement an operator for each management service in the central cluster. According to the strategy created by the administrator using the placement sub-project, the container platform can arrange the type and configuration of the service through the operator, and through the work sub-project Send to each managed cluster.
- Cluster strategy management
The management of the cluster also includes the management of security policies, network policies, and access policies. These strategies usually appear in the form of Kubernetes expanding resources. The container platform can directly use the governance-policy-propagator sub-project, or refer to it to implement its own policy distribution and policy execution status reporting mechanism.
2. Big data and AI job management of
Alibaba Cloud Container Service for Kubernetes (Alibaba Cloud Container Service for Kubernetes, ACK for short) provides high-performance container application management services. With the in-depth use of Kubernetes containers by customers and the development of multi-regional globalization of business, users have created and managed multiple ACK clusters and registered multi-cloud clusters to cope with the continuous expansion of business needs. At the same time, ACK's scheduling of big data and AI jobs, A number of optimizations have been done for queues and distribution, and more and more enterprise users choose ACK to run big data and AI jobs.
For ACK's multi-cluster/multi-cloud users to run big data/AI jobs, how to choose the clusters with matching resource capacity, taint, geographical affinity, and job priority among the many clusters. It is often time-consuming and easy to choose through the artificial judgment of the operation and maintenance team. Errors, and need to go through multiple teams to collaborate. The unified distribution and management of multi-cluster big data/AI jobs has become a thorny problem faced by data scientists/development engineers.
Based on the open architecture of OCM, the ACK job queue controller and multi-cluster scheduler can easily adapt to OCM, combined with OCM's cluster registration management and job distribution mechanism, to realize the multi-cluster scheduling and distribution management of big data/AI jobs.
ACK uses OCM's cluster registration management mechanism to build a multi-cluster operating environment centered on the OCM Hub main cluster, and uses the Hub main cluster as the unified management and control entry. After the user submits the job using the Kubernetes native interface, the ACK queue controller and the multi-cluster scheduler will use the OCM ManagedCluster API to obtain sub-cluster health, resource capacity, affinity and other information based on the job priority and resource usage requirements. Priority jobs are queued, dequeued, and scheduled, and the appropriate sub-clusters are automatically selected for the job. ACK extends OCM Manifestworks API to distribute the job to the corresponding sub-cluster to run, and uses the OCM Proxy plug-in mechanism to obtain the running status of the job on the sub-cluster.
By combining OCM and ACK job queue controller and multi-cluster job scheduler, ACK has solved the problems of automatic big data/AI job scheduling, job distribution and status tracking in a multi-cluster environment.
related links:
_Plugin registration and life cycle management_
_ New Placement API for cluster load and arbitrary resource scheduling_ https://github.com/open-cluster-management-io/enhancements/tree/main/enhancements/sig-architecture/6-placements
clusteradm command line tool (alpha version) https://github.com/open-cluster-management-io/clusteradm
_clusternet plugin_
https://github.com/skeeey/clusternet-addon
based on cluster assignable resources
https://github.com/open-cluster-management-io/enhancements/pull/16
Loads of non-affinity and affinity
https://github.com/open-cluster-management-io/community/issues/49
cluster taint/
https://github.com/open-cluster-management-io/community/issues/48
New proxy plugin project https://github.com/open-cluster-management-io/enhancements/pull/19
Multi-cluster application deployment
https://github.com/oam-dev/kubevela/tree/master/docs/examples/workflow-with-ocm
ArgoCD deployment application
https://github.com/argoproj-labs/applicationset/pull/231
Clusternet cooperation
https://github.com/clusternet/clusternet
Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。