1
头图

图片

Text|Yang Yingming (flower name: Xiang Ye)

Core Contributor of KusionStack, Senior R&D Engineer of Ant Group

图片

Deeply cultivated in the field of infrastructure technology, focusing on IaC/XaC, GitOps and other directions

4912 words of this article read 12 minutes

foreword

KusionStack was the first solution born to solve the complex operation and maintenance scenarios inside Ant. The idea is to use the self-developed DSL (KCL) to precipitate the Kusion Model , to convert the use of some infrastructure capabilities from a white screen to a code, and to combine the DevOps tool chain (Kusion CLI) to quickly verify and validate the configuration. , in order to improve the openness and operation and maintenance efficiency of the infrastructure.

Among them, Kusion Model is the Kusion model library mentioned in the question, and Kusion CLI is the Kusion tool chain. The specific concepts are as follows:

Kusion Model Library

The Kusion model library is a configuration model based on the KCL abstraction. Its features include out-of-the-box, user-friendly, and business abstraction. In fact, the original starting point of the model library is to improve the writing efficiency and experience of YAML users, because many configurations are currently described based on YAML. For example, after Kubernetes has become the de facto standard for container orchestration, K8s-based declarative configuration has become more more up.

However, due to the complexity of K8s itself, the YAML configuration is becoming more and more verbose and complex. We hope to simplify the writing of user-side configuration code by abstracting and encapsulating complex configuration descriptions into a unified model through KCL, a configuration language.

Kusion Toolchain

The Kusion toolchain is a collection of KCL-based DevOps tools, which are used to assist users in better generating and driving their KCL configurations in the Kusion ecosystem.

图片

Simply put, the Kusion model library is a reusable component deposited in KCL language, and the tool chain is the driver of the Kusion model.

This article mainly introduces the practical exploration and summary of the Kusion model library and tool chain in KusionStack in Ant, and focuses on how to use KusionStack to improve the openness and operation and maintenance efficiency of complex infrastructure, hoping to inspire partners who are also facing such dilemmas. .

PART. 1--Why do Kusion model library and toolchain?

We can first look at a "phenomenon-problem" diagram:

图片

In this figure, we list some of the problems encountered in practice in internal large-scale scenarios.

As an example of application deployment, application A has 10+ components, and there is no internal support for such non-standard applications. Each deployment needs to go through many steps, such as metadata preparation, certificate application, VIP, domain name, manual Deploy CRD, RBAC, Webhooks, monitoring configuration, etc. This process is not automated, the delivery and deployment are complex, and the degree of customization is high. If there is a problem in any of the steps, it is necessary to communicate with the corresponding R&D students. The labor cost of application deployment is high.

In general, the dilemma of using existing infrastructure for operation and maintenance in large-scale scenarios at that time was mainly due to the phenomena listed above. Therefore, it is urgent for us to solve the problems behind these phenomena. matter.

PART. 2--Coping ideas in difficult situations

After repeated discussions and consensus, we finally found a solution, that is, through the Kusion model library and tool chain, to solve the above-mentioned complex infrastructure operation and maintenance dilemma from the following aspects.

图片

readability

Business-oriented & shielding the underlying implementation

We abstract the Kusion model library based on KCL, which contains some out-of-the-box models, which are abstracted and refined for the business. The user-oriented model interface exposes the attributes that users care about, and some implementation details are It is shielded, so it is business-oriented and easier to be accepted and used by users.

图片

Consistent with engineering intuition

As a collection of models written in KCL, the Kusion model library is more in line with engineering intuition. Because KCL is a language, it supports defining variables and writing conditional judgments. For example, you can write some differentiated configurations through if-else.

solved problem

The readability improvements introduced in this section are mainly divided into two aspects. On the one hand, KCL, the self-developed configuration language of KusionStack, is sufficiently expressive, which makes the description of configuration and abstract models through KCL smoother; on the other hand, the Kusion Model defined by KCL encapsulates complex configuration conversion logic, shielding Business details, abstracting a clear and easy-to-understand user interface. The readability advantages brought by these two aspects can better solve the problem of difficult maintenance using traditional configuration languages.

Engineering

Front-end and back-end model decoupling

We distinguish the front-end model and the back-end model according to the function of the Kusion model library. Why distinguish between front-end and back-end models? The immediate purpose is to separate the "user interface" from the "model implementation":

Front-end model

The front-end model is the "user interface" . Contains all the configurable properties exposed to the user on the platform side, omitting some repetitive and derivable configurations, and abstracting the necessary properties and exposing them to the user.

Users only need to pass in the necessary parameters to form a "configuration list" of the application like instantiating a class , and then compile the toolchain to get a complete infrastructure-oriented configuration description, such as K8s YAML;

backend model

The backend model is the "model implementation" . The back-end model is different from the front-end model in that it is not perceptible to the user. I just mentioned that the front-end model can constitute the user's configuration list, so how to make the user's configuration list take effect?

We sink all the rendering logic of attributes into the back-end model. In the back-end model, KCL can be used to write logic such as verification, logical judgment, and code fragment reuse to improve the reusability and robustness of configuration code.

Mixin reuse

Mixins are a way to reuse code fragments provided by KCL. To give a specific example, for example, there is an attribute of a model called oversold. Turning on the oversold switch can schedule the Pod to a machine that can be oversold. Generally, the application will turn on oversold when releasing the offline environment to make full use of it. resources of the cluster. The logic that this oversold configuration takes effect may be used by different application operation and maintenance models, then an OverQuotaMixin can be implemented with the help of the Mixin mechanism, and the OverQuotaMixin can be referenced by different back-end models to solve the problem of reusability without reinventing the wheel.

图片

AppConfiguration Precipitation

We abstract into different application operation and maintenance models for different application operation and maintenance scenarios or deployment scenarios. We call these application operation and maintenance models AppConfiguration.

The properties they expose are different, such as the standard application model suitable for standard infrastructure and the network application model suitable for network applications. These different application operation and maintenance models expose different configurable attributes to users. These models can describe the application operation and maintenance configuration of more and more scenarios and become important assets in the process of promoting configuration coding.

solved problem

This section introduces a set of best practices formed by the team in the process of building the Kusion model library of Ant, which is the one-stop and open foundation to promote the construction of the Kusion model library and tool chain.

one stop

Full Lifecycle Configuration Description & Single Source of Truth

We implement full lifecycle configuration descriptions in the process of improving readability and engineering. We put the application deployment configuration, network configuration, monitoring configuration, etc. and configuration related to the application life cycle into a model as much as possible.

The advantage of this is that the configuration fragments scattered in various systems are collected together, and the user can maintain his application configuration in a unified interface. At the same time, for third-party systems, he does not need to connect to different systems, he only needs to operate and maintain one. A unified configuration is sufficient.

"Full life cycle configuration description" is actually doing one thing, which is the Single Source of Truth often mentioned in the industry, which is the so-called "sole source of truth". This is one of the important prerequisites for realizing IaC.

As can be seen from the figure below, the front-end and back-end models in the Kusion model library modularize the operation and maintenance capabilities of different dimensions through KCL, and flexibly organize them in various AppConfiguration models. At the same time, the configuration list instantiated based on AppConfiguration falls into the configuration database as a business configuration for unified operation and maintenance, and finally through the Kusion tool chain and PaaS platform to quickly verify/validate the configuration.

图片

CICD

In some internal practices, we built pipelines for IaC configuration. You can refer to the following picture. The pipeline will perform dependency analysis, unit testing, integration testing, configuration code upload and other steps for each KCL configuration change to ensure the quality and stability of each user configuration change.

图片

solved problem

With the help of the one-stop feature, as we mentioned before, problems such as configuration that is difficult to be fully defined, scattered everywhere, and deployment of application A can be better solved.

openness

MonoRepo

In response to the aforementioned problems of scattered configurations, KusionStack recommends using a configuration repository (MonoRepo) for centralized configuration management.

In the internal implementation practice, the configuration library not only stores the KCL definition of the abstract model itself, but also stores various types of configuration lists. That is to say, it mainly includes two parts: basic configuration and business configuration, business configuration such as application operation and maintenance configuration, policy configuration, etc. The configuration library is recommended to be hosted in various version control systems to facilitate configuration rollback and drift checking.

Configuring a database is actually a way of organizing configuration. You can refer to the following picture. In internal practice, configuration files are organized through dimensions such as architecture domain, project, and environment.

图片

Among them, the business configuration part of the application can refer to any basic configuration, and the basic configurations can also refer to each other.

For example, the user's application configuration is distributed according to the environment dimension, and the configuration of each environment is different. This is actually a relatively common division. Based on the characteristics of the configuration library and KCL itself, the isolation of general environment configuration and environment-specific configuration can be achieved. At the same time, when compiling a specific environment configuration, with the help of KCL's syntactic sugar, the configuration can be automatically merged, and fine-grained coverage rules can also be supported.

Engineering specifications

The division of the project directory structure mentioned above can actually be used as a convention of engineering specifications, which can be standardized by tools.

At the same time, because the configuration library itself is hosted through the version control system, changes to the configuration code can be naturally reviewed. At the same time, combined with the CICD system, the above-mentioned project directory structure inspection and KCL Linter and Test tools are integrated into the pipeline, and a set of standardized workflow can be built.

Co-construction

Based on these works, we can involve more people to build a large configuration library, including the modification of application operation and maintenance, and the description of the model library itself. These are visible to all and can participate in co-construction.

Two pictures are listed below, which are screenshots of different developers reviewing and communicating in the configuration library:

图片图片

solved problem

By implementing the points just mentioned, the capabilities of the infrastructure can be deposited into the configuration library through the model layer to a certain extent.

The benefits can be given as an example. In the past, if you wanted to add a parameter to the network work order, you had to go through a complete R&D iteration, but now that the network-related configuration and rendering logic are sinking into the model library, the demand side only needs to A change review of a back-end model is submitted in the configuration large library, and this change review only needs to pass the relevant Owner Review and all pipeline checks before it can be launched.

The above-mentioned points can solve the previously mentioned problems of closed infrastructure and the inability of the demander to serve themselves, and subsequently, the launch time of new features will be greatly shortened.

It should be noted that configuration coding can release the openness of infrastructure to a certain extent, but it is not a silver bullet and cannot solve all problems.

PART. 3--Introduction to Kusion Toolchain and Engine Architecture

图片

Kusion Toolchain

Consistent Workflow

In the Kusion toolchain, we define a consistent workflow: init -> write -> preview -> apply to help users manage the life cycle of a KCL code.

For example, the initial KCL configuration is like this:

-Users can definitely write it by themselves, but in order to better solve this problem and even help some third-party systems to initialize KCL code faster, we provide scaffolding template repository and Kusion Converter.

- The KCL templates in various scenarios are stored in the scaffolding repository, which is isolated from the code of the Kusion project itself, and anyone can contribute code in this repository.

Kusion Converter was born to solve the problem of quick access to KCL for stock configuration. Users may have written some configuration codes in other configuration languages before, then with the help of the Kusion Converter toolset, they can be converted into KCL codes with one click.

Unified view

The Kusion toolchain can also be easily integrated into third-party systems, so that the output of the system is consistent with the local Console UI view. For example, the pipeline in the open source configuration library integrates the Kusion tool chain, and you can see the same output interface as the local Console in the pipeline log output of the apply step.

Ecological integration

At present, we have integrated Kusion service products, code services and some internal encryption and decryption services internally. We are also building the external ecological integration. At present, we have integrated Github Action and ArgoCD. In the future, we also look forward to integrating with more platforms and open source products to help you solve problems better.

Kusion Engine

The Kusion engine is between KCL and the underlying infrastructure, and is used to interpret the compilation results of KCL and operate on various underlying heterogeneous infrastructure settings.

In the Kusion engine, we fully embrace the Terraform ecosystem. By seamlessly integrating Providers in the Terraform ecosystem, configurations can be delivered to different Runtimes, shielding infrastructure complexity.

At the same time, the Kusion engine also provides some refined Resource Lifecycle management, such as resource dependency analysis and so on.

PART. 4--stage achievements

The first is some phased results after we promote KusionStack internally (as of 2022.7.15) :

-In the configuration library, there are more than 10 AppConfiguration model definitions for different application operation and maintenance scenarios, which are used by different maintenance teams to describe their application models;

- The configuration library has 100+ configuration change reviews every day;

-There are 300+ contributors, involving more than 20 BUs, including SRE, application Owner, big library model developers, etc.;

-There are more than 1000 projects, each application is a project, but the project not only contains applications, but also contains other types of configurations, such as network policies, site configuration, etc.;

- The configuration library has more than 10,000 MRs, more than 50,000 Commits, and 450,000 lines of KCL code (a considerable part of it is submitted and maintained by machines) .

图片

Then there are some data display of business performance improvement:

-Single application SLO monitoring configuration effective time shortened from 1 day to 0.5 hours;

- Application operation and maintenance requirements have been shortened from 25 days to 5 days;

- Application A deployment time reduced from 1 month to 0.5 hours;

-The number of network-related work orders has been shortened from the original 7 to 1, realizing: 1 work order, 1 approval.

图片

PART. 5--Summary and Outlook

Through the implementation of KusionStack within Ant, we have already gained some practical experience. Although it may not be applicable to all companies, it should be helpful to partners who are also facing such difficulties.

On the one hand, KusionStack needs to continuously solve some operation and maintenance problems within Ant; on the other hand, we also hope to broaden our horizons, expand more scenarios and continue to polish the entire technology stack with the help of the fertile soil of open source.

图片

KusionStack is currently in a very early open source stage, and there is still a lot of work to be done. Given our limited manpower, it is very difficult for us to build a technical solution that satisfies everyone. The above picture shows the route planning and some challenges related to the Kusion toolchain and the model library, for your reference and discussion, welcome to submit issues and pat bricks, thank you!

Related Links

Kusion toolchain and engine: http://github.com/KusionStack/kusion

Kusion model library: http://github.com/KusionStack/konfig

Roadmap: http://KusionStack.io/docs/governance/intro/roadmap

understand more…

KusionStack Star ✨: https://github.com/KusionStack/Kusion

The open source of KusionStack, I hope it can be helpful to everyone, and I hope to improve KusionStack with more friends. Students who are interested in cloud native, operation and maintenance automation, programming languages, and compilers are welcome to participate in community co-construction, explore and make breakthroughs in the upgrading of new technical fields, and realize more new ideas.

Recommended reading of the week

KusionStack open source feeling | It took two years to break the dilemma of "interlaced like mountains"

图片

KCL: Declarative Cloud-Native Configuration Policy Language

图片

Wonderful review | KusionStack is open source~

图片

Full analysis of Go native plugin usage problems

图片

Welcome to scan the code to follow:

图片


SOFAStack
426 声望1.6k 粉丝

SOFAStack™(Scalable Open Financial Architecture Stack)是一套用于快速构建金融级分布式架构的中间件,也是在金融场景里锤炼出来的最佳实践。