Deliver and manage cloud-native multi-cluster applications with a consistent experience

Authors: Feng Yong, Sun

Hello everyone, I'm very happy to share with you at KubeCon China Summit. Today's theme is "Build and manage multi-cluster applications with a consistent experience". The protagonists are KubeVela and OCM, both of which are CNCF open source projects. The whole speech is roughly divided into three parts, first introducing the challenges of cloud native application delivery and management, then introducing the KubeVela and OCM technical principles behind this, and finally the overall best practices, and a complete demo.

background

As cloud ecology of native prosperity, Kubernetes becoming the standard infrastructure integrated interface, an increasing number of infrastructure capacity becomes a declarative API out of the box, CRD Operator [1 ] popularity It also makes the operation and maintenance capabilities gradually tend to be declarative and automated. As shown in FIG 1, to an upper layer from the underlying application infrastructure development, now CNCF Eco [2 ] There are thousands of items.

Figure 1. CNCF landscape ecology - from 2021.12

However, the numerous choices are gifts and troubles for R&D engineers. Different application architectures involve different development languages, technical frameworks, packaging methods, product forms, cloud resources, and operation and maintenance methods. In the software life cycle, there are also great inconsistencies in the environment corresponding to development, testing, pre-release grayscale, production deployment, and application delivery and deployment experience. Switching to the cloud-native technology stack involves a lot of complex new knowledge that developers need to learn. This is like writing programs against a large number of APIs at the bottom of the operating system. The lack of programming frameworks and standard libraries makes application development more effective with less effort.

How to make good use of the prosperous and prosperous cloud-native technology ecosystem, so that business R&D personnel can obtain a consistent low-threshold experience, and at the same time deliver applications safely and reliably, is a huge challenge facing the industry.

Typical practices and challenges in the industry

To address this last mile problem of application delivery, typical industry practices fall into two broad categories.

The first type is to build an internal PaaS platform based on Kubernetes for its own business scenarios, hide the Kubernetes platform interface, and provide its own platform API. This model is usually used in larger companies and needs to be supported by a team that is proficient in Kubernetes and business. However, with the passage of time, business scenarios become complex and demands gradually increase. Internal self-built PaaS will face insufficient scalability and difficult maintenance, and finally have to overturn the dilemma of redoing. This is especially evident in the practice of transformation. On the other hand, larger companies usually have different business teams. Due to insufficient top-level design, the PaaS platforms built by different teams can easily become independent chimneys. A large number of similar capabilities can only be repeatedly and cannot be reused. , and can't manage unified way. The different platforms for each different scenario bring new concepts and learning burdens to the upper-level business teams.

In response to the problems in the first type of scenarios, the industry gradually tends to expose the K8s native API in the container platform layer, which is responsible for providing stable cloud-native ecological capabilities without affecting flexibility and scalability. Application delivery is packaged through a Pipeline like Jenkins/Gitlab, and application artifacts (such as Helm Chart, etc.) are packaged, and then directly deployed to the container platform, which extends the second type of practice. Based on the mode in which traditional CI ecological tools directly connect to the container platform, as shown in Figure 2, the core of this approach is to build an abstract system through scripts, configuration, etc. to simplify the use threshold. This method is currently a popular solution in the industry, and its benefits are more clearly divided. The container platform layer acts as an Infra/SRE team to maintain infrastructure capabilities, and application delivery makes full use of the existing CI system in the enterprise, without the need to build a platform.

Figure 2. Typical solutions in the industry

This method solves the problem of application management to a certain extent. For small and medium-sized enterprises, even if they lack the maintenance ability of the container platform, they can solve the problem by purchasing cloud services. At the same time, it can also provide a one-stop consistent experience for business developers. . But the main problem is in the application delivery link. No matter what scale and scenario it is, enterprises will soon encounter these challenges:

lacks automation and requires a lot of manual maintenance. example, due to unexpected network reasons or due to a problem in the Pipeline system itself, all releases will be interrupted and manual intervention is required, lacking self-healing ability.
lacks a unified application model. Whether as a complete deliverable of the application or as the core entry (single source of truth) of the application, the unification of the application model is extremely important. The lack of a unified model description as the entry point for application delivery will lead to different people making changes to the system from multiple places, such as through the CI Pipeline system, or directly changing Kubernetes. After a long time, the configuration of the system will be very inconsistent, resulting in cause a failure.
is a security risk. application delivery based on the CI pipeline system, the security domain of the CI/CD system usually does not have isolation, that is, CI/CD is completed through a set of systems, and the keys of all clusters of the infrastructure are all stored in the same system. Hacker breakthroughs can easily lead to very large system security risks. For example, Codecov, a code coverage statistical tool, suffered a security incident in April this year, [3 ] , and all CI keys configured by projects using this product were leaked. On the other hand, more and more open source software is being adopted into the production system of enterprises, and the integration of these software also exacerbates the security risks.
maintenance cost. Simplifying developers’ minds through scripts and templates is the core of this model, but if the maintenance of scripts and templates lacks open source standards and ecology, they will soon lack vitality, and over time will become a mystery that no one dares to touch anymore. The code is extremely dependent on the experience of the original builders of this system, and it is difficult to continue and iterate.

Therefore, how to build a stable and reliable application delivery and management system has become the key to our urgent exploration.

How to build a stable and reliable application delivery and management system?

How to ensure the stability, reliability and security of application delivery, let's look at the solution to the problem separately. First, in order to solve the stability and reliability problems of large-scale application delivery, we took inspiration from the design of Kubernetes.

Inspirations from Kubernetes

Kubernetes has two core concepts, one is a declarative API, and the other is an end-state-oriented control loop.

Declarative is the best way to express user-side abstraction , it can greatly simplify the user's mind, from describing the process to describing the result. Why declarative can simplify the mind? Take an example of eating to make it easy for everyone to understand. The traditional process-based description is like eating at home by yourself. You have to buy ingredients, study recipes, make them and eat them, and finally The whole process of washing dishes is very complicated. The declarative description is that you go to a restaurant, tell the waiter what you want when you order, and the restaurant will bring it to you in the end.

Figure 3. Control loop in Kubernetes

On the other hand, final state oriented control loop is also the best way to guarantee reliability This design principle requires our system to have the following characteristics:

Automate , that is, it can always run towards the final state, and if the state is not reached, it will continue to run. The automatic feature also ensures that our system will not be interrupted due to a little instability, and has a strong self-healing ability.
Convergence , that is, the result can be approached to the final state during the operation of the system, and the final result can be continuously approached each time the system is executed.
idempotent and means that we need to execute the same process multiple times to ensure that the results are consistent, and there will be no unexpected problems because of multiple executions.
deterministic and means that as long as the system runs normally, it will eventually continue to execute until a definite result is reached.

These four characteristics are also the core elements of large-scale application delivery.

Principles of Application Delivery Security

Then we take a look at security issues, in fact, the core is very simple treated like treat as the production environment application delivery security system . In April 2021, the CNCF Foundation also released application delivery best practices [4 ] and white papers [5] , which mentioned some of the application delivery.

delivery environment On the one hand, it means that different delivery destinations should be isolated, the delivery system should not have direct access to all Kubernetes clusters, and different environments should also be isolated. On the other hand, for different stages of the CI/CD application life cycle, the security domain should also be isolated and use independent key information.
Integrate check This principle means that during the delivery process, we must do necessary security checks on the integrated open source tools, the dependency packages used, and the image of the application itself.
minimum permission is . This principle is relatively easy to understand. In practice, it is necessary to split permissions. For example, use tokens instead of permanent keys, add necessary approval processes, and use key management tools. Especially in the use of cloud resources, the necessary permissions should be split for the secret key, such as the use of sub-accounts and other mechanisms, to avoid a single point of failure caused by the leakage of a secret key and the failure to recover it in time.
and security monitoring This principle requires our application delivery and management system itself to have good observability capabilities, auditing of delivery behavior and overall security monitoring.

The core elements of the final-state application delivery system

So to sum up, a stable, reliable and secure application delivery system needs to have the core elements as shown in Figure 4 below.

Figure 4. Core elements of an application delivery system

The entrance of this system is a complete application model, covering different deliverables, cloud resources, and the configuration of various environment orchestrations. It is the unified entrance and the only basis for application delivery.

At the same time, this application model adopts a declarative approach to face business users, providing the business layer with abstract capabilities for use scenarios, which can lower the user's threshold for use, reduce the complexity of the underlying API, and at the same time have the ability to flexibly expand. The delivery system interacts with the declarative API by pulling, and the user only needs to describe the final state at the model layer, and the final delivery system will continue to converge toward this final state.

Inside the delivery system, there are two core functions of orchestration and deployment. At the same time, behind these two core functions, there must always be a control loop oriented to the continuous operation of the final state. This control loop is the cornerstone of automation and determinism, and it must also have the ability to monitor and audit deliverables and delivery process safety.

Among them, orchestration solves the problems of direct dependencies, deployment order, data transfer, and multi-cluster multi-environment configuration of different deliverables. However, the complexity of orchestration should not be exposed to users. Delivery system orchestration execution engine should be able to complete the final state automated scheduling based on declarative user described , rather than manually writing process orchestration. Deployment solves the deployment of different deliveries, such as cloud resources, Kubernetes resources, and multi-cluster delivery. It needs to provide sufficient scalability to ensure that different deliverables can be deployed, and can be isolated in a multi-cluster environment. Complete the deployment under safe conditions.

Next-generation cloud-native application delivery and management

modern application delivery platform built in full accordance with this philosophy, which can make your application delivery and management in today's popular hybrid and multi-cloud environments simpler, easier and more reliable. evolution of 161e022592a850, hundreds of contributors have participated in code contributions, absorbing engineering practices from a series of companies in different fields such as Alibaba, Ant, Tencent, Bytes, Fourth Paradigm, Gitlab, etc., and making full use of the technical ecology of the cloud native field. Dividend realizes the delivery and management of applications.

Figure 5. KubeVela architecture

Standard, Open Application Model (OAM)

There is a complete application model behind KubeVela, which is OAM (Open Application Model). OAM is an application model jointly released by Alibaba and Microsoft in 2019. It has now been practiced in cloud products by many cloud vendors such as Alibaba, Microsoft, Oracle, Salesforce, and Tencent. 》Industry standard release.

Figure 6. OAM application model

As shown in Figure 6, the OAM model abstracts complex application delivery and management into four core concepts of application, component, operation and maintenance capability, and workflow, so that users can describe all aspects of application delivery and management through a configuration file content. OAM has sufficient scalability to meet the demands of application delivery to different clouds and different environments. At the same time, OAM is not bound to the Kubernetes ecosystem. If the out-of-the-box KubeVela does not meet the requirements, the OAM community also welcomes users to participate in the construction of the model layer and implement it independently according to the model.

Kubernetes-based application delivery control plane

KubeVela is a microkernel architecture, the core of which is a Kubernetes-based application delivery and management control plane, with four characteristics of self-healing, idempotency, convergence and determination. The minimally deployed KubeVela only needs 0.5 cores and 200M memory to support the delivery of hundreds of applications. At the same time, KubeVela itself also has a series of out-of-the-box plug-ins, which support a series of functions including multi-cluster delivery, cloud resource management, GitOps, Helm chart, observability, etc. Even KubeVela's own UI console is also available through plug-ins. Integrated.

KubeVela is not limited to the Kubernetes ecosystem. The official built-in cloud resource plug-in can integrate any terraform module to deliver different cloud resources and even virtual machines. At the same time, thanks to the OAM model and the scalability principle that KubeVela has strictly followed from the beginning of its design, users can easily increase and expand the ecological capabilities.

Safe and reliable multi-cluster management technology (OCM)

The multi-cluster technology behind KubeVela is also the core of ensuring application stability, security, and large-scale delivery. We know that multi-clusters have long become a common form of enterprise internal infrastructure due to considerations such as region, scale, and disaster tolerance. However, the native management capabilities of Kubernetes are still at the single-cluster level. Each cluster can run stably and autonomously, but it lacks the ability to manage across multiple clusters. In order to form a stable and unified application delivery and management platform, the following capabilities are required:

Operation and maintenance managers can learn about changes in multi-cluster water levels, node health status and other information
The business owner can decide how to deploy the deployment distribution of application services in each cluster
Application operation and maintenance personnel can know the service status and issue strategies for moving

Thanks to the OCM (Open Cluster Management) multi-cluster technology jointly initiated and open sourced by RedHat, Ant, and Alibaba Cloud, KubeVela can easily solve the life cycle management problems of resources, applications, configurations, policies and other objects in multi-cluster and mixed environments. OCM enables users to easily deliver and manage applications using familiar open source projects and products in multi-cluster and hybrid environments, just like on a single Kubernetes cluster platform.

Figure 7. Technical features of OCM

Compared with the existing multi-cluster technologies, OCM's architecture has technological leadership in the following aspects, meeting users' core demands for cloud-native application delivery security, stability, and large-scale capabilities:

Modularity: OCM management platform consists of management primitives, basic components and numerous extensible components, which can be freely combined like Lego blocks to build a functional set that meets user needs. This concept is also naturally compatible with KubeVela, which together provide users with sufficient ecological scalability.
Large scale: OCM's infrastructure inherits the open and extensible features of Kubernetes. Just as a single Kubernetes cluster can support thousands of nodes, OCM is designed to support thousands of managed clusters.
Security: OCM is designed on the basis of zero trust and least privilege, and the registration communication between the management component and the managed agent adopts a two-way handshake. In addition, the update of certificates, management of access rights, and distribution of permission tokens also adopt a design similar to that of Kubernetes, and are implemented by corresponding extensible components through the native API of Kubernetes.
Scalability: OCM provides a programming framework and management API for extensible components. Together with basic components and management primitives, it can easily migrate projects in the Kubernetes ecosystem to the OCM multi-cluster management platform. Through the programming framework, the OCM community has implemented many Kubernetes management capabilities in a multi-cluster environment. Thanks to this, KubeVela and OCM can easily achieve deep integration.

Safe and consistent application delivery experience with KubeVela and OCM

KubeVela and OCM are inherently complementary. KubeVela is responsible for application layer delivery and lifecycle management, while OCM manages the lifecycle of cluster infrastructure resources. They work closely together to provide end-to-end capabilities for application delivery and lifecycle management in a multi-cluster environment.

Figure 8. Consistent experience across environments, multi-cluster delivery

As shown in Figure 8, using KubeVela and OCM, the user's delivery process in different environments will be fully automated, saving a lot of manual operations.

For the same application, business R&D personnel only need to describe the components once, and they can bind different operation and maintenance capabilities to run independently in different delivery environments. For example, the development environment can use the device-cloud joint debugging, and the pre-production environment can bind the observability strategy. And so on, there is no need to repeat component-level parameters such as images, ports, environment variables, etc., which greatly saves the mental burden.
On the other hand, on the control plane, the cross-environment deployment of different applications can be automatically completed in one delivery process, and multiple approval methods can be selected or executed automatically.

More importantly, based on KubeVela and OCM, it is possible to build secure GitOps multi-cluster application delivery based on subscription model.

Figure 9. Secure application delivery in GitOps mode

As shown in Figure 7, the CI process still follows the previous model, but the CD side describes an independent configuration repository. First, from the Git repository configuration level, the permissions of the two repositories are completely isolated. Through the unified application model, users can fully describe the application in the configuration repository, and KubeVela can also monitor the changes in the complete application description, so as to actively pull the application configuration. During this process, the Git repository does not hold any keys to the delivery system. After the delivery system completes the orchestration, the management and control data is distributed through OCM, and the entire process is also actively pulled by the runtime cluster. The delivery system does not have the key to centrally control any sub-clusters. After the control data is delivered to the sub-cluster, the GitOps agent of the sub-cluster can actively acquire the changes of the delivered products, forming a new autonomy.

We can see that each link works independently and forms its own system, and each link is authorized and audited as needed, which is safe and reliable. The collaboration of KubeVela and OCM can unify the orchestration and management of hybrid cloud, multi-cluster, and edge applications, and finally realize the secure delivery of cloud-edge collaboration.

Start your end-to-end platform

After we understand the design principles of application delivery and management engine with KubeVela and OCM as the core, I can see that the biggest benefits of this set of models are automation, low threshold and security. So how to start practice? Starting with the latest KubeVela v1.2 release, we've enhanced the full end-to-end user practice where you can deliver and manage applications in a visual way.

First of all, KubeVela's architecture is entirely based on a microkernel and a highly scalable model, which can help users build their own cloud-native solutions on demand with minimal cost. As shown in Figure 8 below, KubeVela's plug-in management page, including KubeVela's own UI function and OCM multi-cluster function, are all KubeVela plug-ins. Based on the microkernel, users can customize and switch their own extended functions at will. At the same time, it also helps users quickly acquire more cloud-native ecological capabilities, and helps enterprises better realize scenarios such as resource management, DevOps, and application governance. At the implementation level, KubeVela's plug-in system is completely based on the open-source community model for mutual construction and mutual assistance. All content submitted to the community plug-in repository [6] will be automatically discovered by the KubeVela kernel, and we believe it will be available in the near future. Numerous choices.

Figure 10. KubeVela visualization plugin page

KubeVela also supports a series of component types by default, as shown in Figure 9, including application delivery based on container images. This mode has low threshold and no knowledge of Kubernetes, and is suitable for most enterprise users; of course, it also supports YAML applications of Kubernetes. OCM technology delivers applications to multiple clusters; in addition, by installing officially maintained plug-ins, users can obtain component types such as Helm Chart packages, cloud resources of various cloud vendors, and obtain corresponding complete application delivery and management capabilities. You can give full play to your imagination. What other types of applications can KubeVela deliver by extension?

Figure 11. Components of KubeVela

In the end-to-end UI system of KubeVela, the concept of environment is also added by default, which supports different environments involved in the application development and delivery life cycle, such as development, testing, and production environments. The same application deployed in different environments is completely isolated, safe and reliable, and does not require repeated configuration by users, bringing users a consistent experience.

Figure 12. Define delivery to multiple clusters in the same environment

Figure 13. Control the delivery process with configurable workflows

At present, KubeVela 1.2 has released the preview version 1.2.0-beta.3 [7] , and community users are welcome to download and experience.

In the future, KubeVela will provide more complete end-to-end user experience, enrich the capabilities in more vertical scenarios, and evolve towards the direction that developers can complete application delivery by themselves, making application delivery in a hybrid environment like we use the App Store today. Just as simple.

Deliver and manage cloud-native multi-cluster applications with a consistent experience

background

Typical practices and challenges in the industry

How to build a stable and reliable application delivery and management system?

Inspirations from Kubernetes

Principles of Application Delivery Security

The core elements of the final-state application delivery system

Next-generation cloud-native application delivery and management

Standard, Open Application Model (OAM)

Kubernetes-based application delivery control plane

Safe and reliable multi-cluster management technology (OCM)

Safe and consistent application delivery experience with KubeVela and OCM

Start your end-to-end platform

Related Links

阿里云云原生

引用和评论

Spring AI Alibaba 发布企业级 MCP 分布式部署方案

记录下安装open-eBackup过程

基于 MCP 的 AI Agent 应用开发实践

OSPO Summit 2025 正式定档！议题征集同步开启

OSPO Summit 2025 首批议程发布！

🔥吐血整理 Bolt.diy 部署与应用攻略

Koupleless 助力「人力家」实现分布式研发集中式部署，又快又省！