Kubernetes has become the Android of the cloud-native era. Is that enough?

Author: Situ Fang

review & proofreading: Tian Weijing, Xi Yang

Editing & Typesetting:

Introduction: cloud-native era, it is too complicated to directly use Kubernetes and cloud infrastructure. For example, users need to learn a lot of low-level details, application management is costly, error-prone, and frequent failures. With the popularity of cloud computing, different clouds have different details, further exacerbating the above problems.

This article will introduce how to build a new application management platform on Kubernetes, provide a layer of abstraction to encapsulate the underlying logic, and only present the interfaces that users care about, so that users can only focus on their own business logic and manage applications faster and more securely.

The cloud-native era is a very good era. What we are facing is the disruptive innovation of the overall technology and the comprehensive end-to-end reconstruction of applications. Currently, cloud native has produced three key technologies in the evolution process:

The first is containerization. As a standardized interaction medium, containers have greatly improved operation and maintenance efficiency, deployment density, and resource isolation compared to traditional methods. According to the latest CNCF research report, 92% of enterprise production systems are currently operating Use container
The second is Kubernetes, which abstracts and manages infrastructure and has now become a standard for cloud native;
The third is Operator automated operation and maintenance. Through the mechanism of controllers and customized resources, Kubernetes can not only operate and maintain stateless applications, but also perform user-defined operation and maintenance capabilities to achieve more complex automated operation and maintenance applications for automated deployment and interaction. .

These three key technologies are actually in a gradual evolutionary relationship. In addition, in the field of application delivery, there are corresponding theories that are constantly evolving following the above-mentioned technologies. The rise of cloud native has brought about comprehensive upgrades and breakthroughs in delivery media, infrastructure management, operation and maintenance models, and continuous delivery theory, accelerating the arrival of the cloud computing era.

1 native cloud technology panorama ( details, click here )

From the cloud-native technology panorama released by CNCF (see Figure 1), you can see the vigorous cloud-native ecology. In the breakdown of the 900+ Logos in the figure, there are many open source projects and startups. In the future, cloud-native technologies will be in these areas. A place is born.

Cloud-native "operating system" Kubernetes brings application delivery challenges

As mentioned above, Kubernetes has become the standard configuration of cloud native. The difference between the underlying encapsulation infrastructure and the upper supports the operation and maintenance deployment of various applications, such as stateless applications and microservices, as well as stateful, batch processing, and The application of new technologies such as big data, AI, and blockchain can be deployed on Kubernetes. Kubernetes has become a practical "operating system". Its position in cloud native is like Android in mobile devices. Why do you say this? Android is not only installed on our mobile phones, it has further penetrated into smart terminals such as cars, TVs, and Tmall Genie, and mobile applications can run on these devices through Android. And Kubernetes also has this potential or development trend. Of course, it does not appear in smart home appliances, but in various public clouds, self-built computer rooms, and edge clusters. It can be expected that Kubernetes will be as ubiquitous as Android in the future.

So, with the delivery of Kubernetes, can the container + Kubernetes interface be able to solve all the delivery problems? The answer is definitely not. Just imagine, if our mobile phone only has the Android system, can it meet our work and life needs? No, there must be a variety of software applications. Corresponding to cloud native, in addition to the "operating system" of Kubernetes, it also requires a set of application delivery capabilities. In mobile phones, software applications can be installed by users through applications such as "pea pods". Also in the cloud-native era, we also need to deploy applications to different Kubernetes clusters. However, due to Kubernetes' massive amount of trivial facility details and complex operating languages, various problems will be encountered during the deployment process. At this time, cloud-native "pea pods" are needed to solve this problem, that is, application management. Platform to shield the complexity of delivery.

There are two mainstream models for application management platforms in the industry. first is the traditional platform model , which "puts a big hat" on Kubernetes and shields all complexity. On top of this, it provides a layer of simplification based on needs. Application abstraction. In this way, although the application platform has become easier to use, new capabilities need to be developed by the platform, which also brings about difficulties in expansion and slow iteration, which cannot meet the growing demands of application management.

is the container platform model . This model is relatively cloud-native, with open components and strong scalability. However, it lacks the abstraction of the application layer, causing many problems, such as a steep learning path for developers. For example, when a business developer submits his code to the application platform, he needs to write Deployment to deploy the application, write Prometheus rules to configure monitoring, write HPA to set up elastic scaling, write Istio rules to control routing, etc. These are not business development Hope to do it.

Therefore, no matter which solution is used, there are advantages and disadvantages, which need to be selected. So, what can we do to encapsulate the complexity of the platform and still have good scalability? This is what we have been exploring.

Shield the complexity of cloud-native application delivery through the application management platform

In 2012, Alibaba has begun to conduct containerization-related research. At first, it was mainly to improve resource utilization and started the road of self-developed container virtualization technology. With the increasing number of machine resources in response to the big promotion, the hybrid cloud elastic architecture of containers was adopted in 2015, and the public cloud computing resources of Alibaba Cloud were used to support the traffic peak of the big promotion. This is also the early stage for Alibaba to become cloud native.

The turning point occurred in 2018. After Alibaba's underlying scheduling adopted open source Kubernetes, we changed from a scripted installation and deployment model for virtual machines to a standard-based container scheduling system deployment application, and comprehensively promoted the Kubernetes upgrade of Alibaba's infrastructure. . But soon, a new problem appeared: The application platform has no standards and is not uniform, and everyone "does their own affairs."

Therefore, in 2019, we worked with Microsoft to release an open application model-OAM (Open Application Model), and began to make OAM platform transformation. Everything went smoothly. In 2020, KubeVela, the implementation engine of OAM, was officially open sourced, and internally promoted the evolution of multiple application management platforms based on OAM and KubeVela. And promoted the three-in-one strategy, not only Alibaba's internal core systems fully use this set of technology, but also use the same technology in customer-oriented commercial cloud products and open source. By fully embracing open source, let the entire OAM and KubeVela community participate in the co-construction.

During this exploration process, we have taken many detours and accumulated a lot of experience in stepping on pits. Next, we will give a specific introduction and share the design principles and usage methods of KubeVela to help developers understand the integrity of the cloud native application management platform. Solutions to improve the experience of application developers and the efficiency of application delivery.

Cloud native application management platform solutions

In the process of exploring cloud native application management platform solutions, we mainly encountered 4 major challenges and summarized 4 basic principles, which will be introduced one by one below.

Challenge 1: The application platform interfaces of different scenarios are not uniform, and the construction is repeated.

Although the cloud native has a Kubernetes system, it will build different application platforms in different scenarios, and the interfaces are completely inconsistent, and the delivery capabilities are very different. For example, AI, middleware, serverless, and e-commerce online businesses have their own Different service platforms. Therefore, when building an application management platform, repeated development and repeated operation and maintenance are inevitable. The most ideal situation is of course to achieve reuse, but the operation and maintenance platform architecture models are different, and there is no way to achieve interoperability. In addition, when business developers dock applications for delivery in different scenarios, the docking APIs are completely different, and the delivery capabilities are very different. This is the first challenge we encountered.

Challenge 2: "End-oriented" cannot satisfy the procedural delivery method.

In the cloud-native era, end-state-oriented design is very popular because it can reduce the user's concern about the implementation process. Users only need to describe what they want, and do not need to plan the execution path in detail, and the system can automatically get things done. However, in the actual use process, the delivery process usually requires human intervention such as approval, suspension of observation, and adjustment. For example, our Kubernetes system is in a state of strong management during the delivery process and needs to be approved for release. It is clearly stated in the "Alibaba Group Change Management Specification" that "online changes, the first x online production environment batches, the observation time after each batch change should be greater than y minutes." "must be in the safe production environment (SPE) first Only after the SPE has verified that there is no problem, can the online production environment be released in grayscale.” Therefore, application delivery is a process-oriented rather than final state-oriented execution process. We must consider how to make it better adaptable. Process-oriented process.

Challenge 3: The complexity of platform capacity expansion is too high.

As mentioned above, the application platform in the traditional model has poor scalability, so in the cloud-native era, what are the common mechanisms for extending the platform? In the Kubernetes system, template languages such as Go Template can be used directly for deployment, but the disadvantage is that it is not flexible enough. The entire template is complex to write down and difficult to maintain on a large scale. Some experts may say "I can customize a set of Kubernetes Controller, the scalability must be very good!" Yes, but there are fewer people who understand Kubernetes and the CRD extension mechanism. Even if the master writes out the Controller, he still has a lot of follow-up work to do, such as compiling and installing it to run on Kubernetes. In addition, the number of Controllers cannot keep expanding like this. Therefore, it is very challenging to build a highly scalable application platform.

Challenge 4: Different environments and different scenarios have huge differences in delivery.

In the application delivery process, the operation and maintenance capabilities of different environments are extremely different. For example, the development and testing environment pays attention to the efficiency of development and joint debugging. Each modification adopts a set of processes of hot loading, without repackaging and mirroring deployment, and at the same time deploying an independent environment created on demand for developers. Another example is the pre-release joint debugging environment, which has daily operation and maintenance requirements such as offensive and defensive drills and fault injection. And in the production environment, it is necessary to add operation and maintenance capabilities in terms of safe production and high service availability. In addition, there are huge differences in component dependencies for the same application. There are many differences in databases, load balancing, and storage on different clouds.

In response to the above four challenges, we summarized the modern application management platform:

A unified, infrastructure-independent open application model.
Declarative delivery around workflow.
Highly scalable and easy to program.
Design for mixed environments.

Principle 1: A unified, infrastructure-independent open application model.

How to refine a unified, infrastructure-independent open application model? Take the open application model, or OAM, as an example. First of all, its design is very simple and can greatly simplify our use of the management platform: the original user had to face hundreds of APIs, and OAM abstracted them into four types of delivery models. Secondly, OAM describes the components to be delivered from the perspective of business developers, the operation and maintenance capabilities and delivery strategies to be used, and the platform developers provide the implementation of operation and maintenance capabilities and delivery strategies, thereby shielding developers from infrastructure details and differences. . Through the component model, OAM can be used to describe products such as containers, virtual machines, cloud services, Terraform components, and Helm.

Figure 2 An application delivery example described by the open application model

As shown in Figure 2, this is an example of KubeVela application delivery described by OAM, which contains the above four types of models. First, describe the components to be delivered when an application is deployed, generally in the form of images, product packages, cloud services, etc.; second, describe the operation and maintenance capabilities (Trait) used after the application is deployed, such as routing rules, Auto-scaling rules, etc., the operation and maintenance capabilities are all acting on the components; again, it is the delivery policy (Policy), such as cluster distribution policy, health check policy, firewall rules, etc., any rules that need to be followed before deployment can be in this Stage declaration and execution; finally, the definition of Workflow, such as blue-green deployment, progressive deployment with traffic, manual approval and other arbitrary pipeline continuous delivery strategies.

Principle 2: Declarative delivery around workflow.

The core of the above four types of models is workflow. Application delivery is essentially an orchestration, in which components, operation and maintenance capabilities, delivery strategies, workflow steps, etc. are defined in order in a directed acyclic graph DAG.

Figure 3 An example of KubeVela's application delivery through workflow orchestration

For example, the first step before application delivery, such as installing system deployment dependencies, initialization checks, etc., is described through the delivery strategy and executed at the beginning of delivery; the second step is dependent deployment, such as the application depends on the database, we You can create related cloud resources through components, or you can reference an existing database resource, and inject the database connection string as an environmental parameter into the application environment; the third step is to use the component to deploy the application itself, including mirrored versions, open ports, etc.; The fourth step is the operation and maintenance capabilities of the application, such as setting up monitoring methods, elastic scaling strategies, load balancing, etc.; the fifth step is to insert a manual review in the online environment to check whether there is a problem with the application startup, and then continue to let the manual confirm that there is no problem The workflow goes down; the sixth step is to deploy the remaining resources in parallel, and then use the DingTalk message to make a callback to notify the developer of the deployed message. This is our delivery process in real scenarios.

The greatest value of this workflow is that it describes a complex delivery process oriented to different environments through standardized procedures.

Principle 3: Highly scalable and easy to program.

We have always hoped to be able to build application modules like Lego blocks, and platform developers can use the platform's business development to easily expand the capabilities of the application platform. However, as mentioned in the previous article, the use of a template language is insufficient in flexibility and scalability, and writing Kubernetes Controller is too complicated and requires extremely high professional capabilities for developers. So how can we have both high scalability and programming flexibility? We finally borrowed from Google Borg's CUElang, which is a configuration language suitable for data templating and data transfer. It is naturally suitable for calling the Go language, is easily integrated with the Kubernetes ecosystem, and has high flexibility. Moreover, CUElang is a dynamic configuration language, does not need to be compiled and released, and the response speed is fast. As long as the rule is released to Kubernetes, it will take effect immediately.

Figure 4 KubeVela dynamic extension mechanism

Take KubeVela’s dynamic extension mechanism as an example. After platform developers have written component templates such as Web services and timed tasks, as well as operation and maintenance capability templates such as elastic scaling and rolling upgrades, they register these capability templates (OAM X-Definition) to the corresponding environment. According to the content of the capability template, KubeVela installs the dependencies required for capability runtime to the cluster of the corresponding environment. At this point, the application developer can use these templates just written by the platform developer. He builds an application yaml by selecting components and operation and maintenance capabilities, and publishes the yaml to the KubeVela control surface. KubeVela arranges applications through Application yaml, runs the corresponding selected capability template, and finally publishes the application to the Kubernetes cluster. The entire process from capability definition, application description, and final delivery is completed.

Principle 4: Design for mixed environments.

At the beginning of the design of KubeVela, we considered that the future may be in a hybrid environment (hybrid cloud/multi-cloud/distributed cloud/edge) for application delivery, and the delivery of different environments and different scenarios is quite different. We did two things. First, the KubeVela control plane is completely independent and does not invade the business cluster. Any Kubernetes plug-in from the community can be used in the business cluster to operate, maintain and manage applications. KubeVela is responsible for managing and operating these plug-ins on the control plane. Second, instead of using KubeFed and other technologies that generate a large number of federated objects, they are directly delivered to multiple clusters to maintain a consistent experience with single-cluster management. Supports Push and Pull modes by integrating multi-container cluster management solutions such as OCM/Karmada. In scenarios such as central control and heterogeneous networks, KubeVela can achieve capabilities such as secure cluster management, environmentally differentiated configuration, and multi-cluster grayscale release.

Taking the solution of Alibaba Cloud's internal edge computing products as an example, developers only need to publish the written image and KubeVela files directly to the KubeVela control plane, and the control plane will distribute the application components to the central hosting cluster or edge cluster. Edge clusters can use edge cluster management solutions such as OpenYurt. Because KubeVela is a unified control plane for multiple clusters, it can achieve unified orchestration of application components, differential configuration of cloud-side clusters, and aggregation of all underlying monitoring information to achieve unified observability and mapping of cross-cluster resource topologies.

Summarize

In general, the above four KubeVela core design principles can be simply included as follows: 1. Based on the underlying details of the OAM abstract infrastructure, users only need to care about 4 delivery models.

2. Around the declarative delivery of the workflow, the workflow does not require additional startup processes or containers, and the delivery process is standardized.

3. Highly scalable and easy to program: The operation and maintenance logic is coded in the CUE language, which is more flexible than a template language and an order of magnitude simpler than writing a Controller.

4. Design for hybrid environments, provide conceptual abstractions around applications such as environment and clusters, and uniformly control all resources that applications depend on (including cloud services, etc.).

Figure 5 KubeVela's location in Alibaba Cloud's native infrastructure

Currently, KubeVela has become part of Alibaba Cloud's native infrastructure. It can be seen from Figure 5 that we have made a lot of extensions on top of Kubernetes, including resource pool, node, cluster management capabilities, and a lot of support for workloads and automated operation and maintenance capabilities. KubeVela has built a unified application delivery and management layer on top of these capabilities so that the group's business can be applied to different scenarios.

How will cloud native evolve in the future? Looking back at the cloud-native development of the past ten years, is the continuous upward movement of standardized interfaces. Why? From the emergence of cloud computing in 2010 to a firm foothold today, the computing power of the cloud has been popularized; around 2015, the large-scale deployment of containers has brought standardization of delivery media; around 2018, Kubernetes has abstracted cluster scheduling and operation and maintenance , To achieve the standardization of infrastructure management; in the past two years, Prometheus and OpenTelemetry have gradually unified monitoring, and Service Mesh technologies such as Envoy/Istio are making traffic management more versatile. From these cloud-native development processes, we have seen the problems of technology fragmentation and application delivery complexity in the cloud-native field, and proposed an open application model OAM and open source KubeVela to try to solve this problem. We believe that application layer standardization will be the trend of the cloud-native era.

Click here , see KubeVela project's official website! !

You can learn more about KubeVela and OAM project details through the following materials:

project code base: github.com/oam-dev/kubevela Welcome to Star/Watch/Fork!

project official homepage and documentation: kubevela.io has provided Chinese and English documents starting from version 1.1. Developers are welcome to translate documents in more languages.

project nail group: 23310022; Slack: CNCF #kubevela Channel

join the WeChat group: first, indicating that you have entered the KubeVela user group:

二维码.png

Author introduction:

Situ Fang, nicknamed "Ji Feng" | Alibaba Cloud senior technical expert, Alibaba Cloud application PaaS and serverless product line leader. After joining Alibaba in 2010, he has been deeply involved in multiple cross-generational evolutions of service-oriented and cloud-native architecture, such as link tracking, container virtualization, full-link stress testing, multiple activities in different places, middleware cloud productization, and cloud-native upgrades. Clouds and so on. Responsible and led the construction of Alibaba's open source technologies and commercial products in the fields of microservices, observability, serverless, etc., and is committed to providing mature and stable Internet architecture solutions and products for external enterprises through cloud native technology. The open source projects that participated in or led the design include KubeVela, Spring Cloud Alibaba, Apache Dubbo, Nacos, etc.

Kubernetes has become the Android of the cloud-native era. Is that enough?

Cloud-native "operating system" Kubernetes brings application delivery challenges

Shield the complexity of cloud-native application delivery through the application management platform

Cloud native application management platform solutions

Challenge 1: The application platform interfaces of different scenarios are not uniform, and the construction is repeated.

Challenge 2: "End-oriented" cannot satisfy the procedural delivery method.

Challenge 3: The complexity of platform capacity expansion is too high.

Challenge 4: Different environments and different scenarios have huge differences in delivery.

Principle 1: A unified, infrastructure-independent open application model.

Principle 2: Declarative delivery around workflow.

Principle 3: Highly scalable and easy to program.

Principle 4: Design for mixed environments.

Summarize

阿里云云原生

引用和评论

通义灵码 AI IDE 上线，第一时间测评体验

🔥吐血整理 Bolt.diy 部署与应用攻略

支付宝H5下载被拦截的原因排查与解决指南

30分钟内输出结果，新加坡国立大学/MIT等基于SVM构建微生物污染检测模型

JManus - 面向 Java 开发者的开源通用智能体

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

PAI Model Gallery 支持云上一键部署 Qwen3 全尺寸模型