Delivery practice of large-scale cloud native applications in privatization scenarios

This article is based on the content of the author’s speech at CSDN Cloud Native Meetup Shenzhen Station, and shares the delivery practices of Netease Shufan’s large-scale applications in privatization scenarios under the cloud native trend, including the problems encountered in the practice process, how to achieve standardization and high Efficient and high-quality delivery plan, and results.

background introduction

The software privatization delivery and deployment is based on the company's own infrastructure, and is a hardware/software operating environment built for the sole use of an enterprise customer; therefore, it can provide effective data security, compliance audits, and service quality control.

determined by market supply and demand. It is also divided into Party A and Party B. Both parties take what they need to make the enterprise-oriented privatization market function normally. For example, some of the demands of the following parties:

Party A (funding provider) appeals

Policy-based industry compliance and security requirements.
The corporate network is completely isolated and restricted, and communication with the Internet cannot be established, and public cloud products cannot be used. For example, financial enterprise intranet.
The data of enterprise operations is sensitive and is not suitable for running directly on the public Internet. For example, the use of corporate WeChat, corporate DingTalk and other data for communication tools between government and enterprises requires absolute confidentiality.
The leading Internet companies have strong cloud technology capabilities, large investment, and relatively good overall products, and they expect to use leading cloud technology capabilities. For example, expect to use Alibaba Cloud, Tencent Cloud, Huawei Cloud, and Shufan Shufan Qingzhou or big data products internally.
Traditional companies hope to carry out digital transformation, but their self-research system is lacking, the cost of re-development is too high and the cycle is long, and they hope to purchase relevant mature products to facilitate rapid digital transformation.
Enterprises have their own computer rooms and servers, but the resources are scattered and the utilization rate is not high. It is hoped that some private products can make full use of these resources.
Business management and collaborative communication are separated, and efforts are made to build an integrated, cost-effective, safe and efficient organization and collaborative management platform.
Lack of a unified standard support platform, products with basic supply capabilities are prone to repeated construction, and user experience is uneven.
It is difficult to respond quickly to business innovation. Years of informatization stacking and vertical construction have resulted in integration capabilities and open capabilities that are not suitable for the business innovation environment.
For other capabilities that are lacking due to special scenarios, we hope that other suppliers can assist in the construction through cooperation.

Party B (product provider) appeals

The growth rate of the public cloud business of public cloud vendors is not as fast as expected, and they hope to expand into new business markets by supporting privatized delivery.
At the beginning of the product design, it was aimed at other enterprises, not at the C-end users of the public network. Like some ERP systems, Netease Shufan Qingzhou platform and so on.

From the above requirements of Party A and Party B, we can know that if Party B can just meet one or more of Party A's needs, then there will be scenarios for cooperation and software privatization delivery.

encountered the following scenarios?

test environment is isolated from the production environment. Due to the requirements of internal processes or safety regulations in some enterprises, the test environment is isolated from the production environment. For example, the isolation between running nodes or the isolation of network strategy, or even physical isolation requires replacement of equipment to access the production environment. At this time, how do the services tested and verified in the test environment go online to the production environment?

production environment prohibits access to the Git execution pipeline. code is one of the core assets of the enterprise. The security of the source code is the key asset of the enterprise; the production environment generally has its own computer room or some third-party public cloud environment, and the enterprise generally does not allow the core source code to be exposed to On the public Internet, access to the Git code hosting platform on the intranet is prohibited in the production environment under these circumstances. Moreover, the image built by the CI pipeline re-executed in the production environment is not the image that the QA team has tested and verified in the test environment. Even though their source code is the same, they are essentially two images.

involves many types of resource management issues when deploying applications in Kubernetes. Deploying services in a Kubernetes cluster is not simply running the Image image. In fact, it also involves some other related Kubernetes resources. For example, common resources are Deployment, Service, Ingress, Secret, ConfigMap, PV/ PVC, ServiceAccount, RBAC, etc. and other resources related to extended services, such as ServiceMonitor of Prometheus-operator. How to maintain and manage so many types of files in order to be efficient and error-free?

application needs to be in compliance with company specifications. In order to ensure the smooth launch and meet the quality and safety requirements of the launch, companies will formulate some procedures or specifications for the business launch. At the same time, in a multi-person collaborative team, if there is no certain normative reference guidance, application management will be chaotic due to inconsistent information. For example, the most common is the naming convention of mirroring. The same service may appear myapp:v20211129, myapp:v1.3.11, myapp:1.3.11, myapp:v1.3.11-20211129, my_app:v1.3.11, my-app: v1.3.11 or even yourapp:v1.3.11. These mirror tags actually refer to the code version of the same Git Release v1.13.11, but because there is no specification constraint, it is difficult to manage and error-prone.

multi-sectoral development of products or subsidiaries reuse . A good application has been developed by one department in the enterprise, and other departments also hope to use this application, especially when a large enterprise has multiple subsidiaries. Such applications such as: daily report and weekly report system, equipment log system, invoice management system, ERP system and so on. At this time, how to share these applications with other departments will become a challenge. If the delivery process is too complicated, it will lead to limited promotion, which will result in a waste of resources at the group level.

software is privatized and delivered to the customer environment. Like the ERP system and Netease Shufan Qingzhou platform mentioned above, these toB privatized products will have many difficulties in delivery. How to deliver software to the customer environment efficiently and with high quality has become one of the core concerns of such enterprises. one.

This article mainly discusses the delivery practices of the Netease Shufan Qingzhou team in large-scale applications in the privatization scenario. The above-mentioned problems will also be involved in the practice process. The practices provided in this article provide some references for the above-mentioned scenarios.

Pain and difficulty of software privatization delivery

Software privatization delivery is generally not very smooth. At different stages, different roles or dimensions will have various problems. Some of these problems may determine whether the entire project is postponed, and some problems may even affect the success of the project.

In the early stages of the project, if some pain points can be assessed and relevant coping strategies are prepared in advance, it will be of great help to the privatization of the product.

From the delivery side, analyze the pain points before and after the project delivery in the overall dimension of privatized delivery, which can be divided into three categories, namely the user side, the delivery side and the engineering after-sales side.

User-side pain and difficulty

diverse product functional requirements. standard product does not meet the needs of the enterprise, and software providers are required to modify or adapt the product according to the needs of the enterprise. However, the cross-enterprise collaboration efficiency is relatively low, and there are often rework modifications and frequent environmental upgrades.
Resource preparation cycle is long. takes a long time, or the process is long when the resource is changed and modified. Since the application for internal resources of the enterprise needs to go through multiple layers of review and approval, if the application does not meet the requirements of the enterprise specification and the application needs to be re-approved, the cycle may be weeks or even months.
Enterprise business online specification. users have their own online operation and maintenance specifications, and the delivered software needs to meet their operation and maintenance or security specifications. For example, there are three levels of guarantee, such as performance indicators, security scanning, and short online time windows, etc., but these may not be fully considered when making products.
user side. cannot be delivered due to equipment relocation, network changes, equipment upgrades, shutdowns, etc.
The resources were not prepared in accordance with the requirements of the deployment plan. The discussed resource demand application cannot be delivered in accordance with the pre-planned plan due to some internal reasons of the customer, or the resource environment prepared by the user is unstable.

delivery side pain

delivery personnel are demanding. delivery process of 161b1c5a043bf7 involves issues such as operating system, Docker/Kuberentes, the delivered product itself, user process and infrastructure, etc. The delivery personnel need to have a strong technical foundation and user communication skills.
test verification is complicated. is completed, in order to ensure the quality of delivery, a test verification process is generally required. For automated tests that cannot be covered, manual regression is required to ensure the environmental quality of the delivery. The more complex the delivered product, the more complex the testing and verification will be.
Cross-enterprise collaboration is difficult. will design a deployment implementation plan and communicate with users before the project starts deployment, but due to the misalignment of the two parties, the plan and resources are inconsistent, and the on-site implementation is blocked. Even if the plan can be adjusted, it will still bring the risk of delay to the project.
user network isolation and restriction. Considering security and other reasons, the network cannot access the Internet. When encountering problems in the delivery and implementation process, companies are restricted to find information, seek remote assistance, and download update files.
The deployment package is large and it takes a long time to upload files. When using containerized private delivery, the program is generally built into an offline Image image. Image Even though the overall size of the image is reduced by the reuse of the image layer, there are still a large number of layers that cannot be reused due to the complexity of the delivery system itself. As a result, the offline image package or deployment package is relatively large. In particular, the network quality and bandwidth of traditional enterprise users are not as good as those of Internet enterprises. This leads to the fact that the deployment package used in project delivery is copied to the customer site using a mobile hard disk, and it will still cost more to upload to the actual deployment and operation environment. time.

Engineering after-sales side pain and difficulty

infrastructure is diverse. In the privatization scenario, different enterprise customers have their own unique infrastructure. For example, for different hardware, some customers purchased Huawei server hardware, some customers purchased HP server hardware, and some customers customized their own servers. Operating systems are also different. For example, the common operating systems in enterprises include CentOS/Debian/Ubuntu/Redhat/Tongxin UOS/Kylin OS and so on. The CPU architecture may also be different. There are x86 servers and ARM servers.
privatized products are complex. If the privatized product itself is more complicated, when the privatized version is released, a person in charge who can truly understand the product and have a deeper understanding of the privatized delivery system is required to control the overall privacy. Otherwise, due to privatization technology selection, version management, automated packaging, insufficient awareness of the scene, lack of prepared materials, or other incomplete considerations, the delivery and management of the product will be complicated and chaotic.
product itself is not of high quality. Due to the frequent changes in requirements due to the addition of new functions, the product function is unstable and the number of bugs is relatively large. If the QA test is not in place, it will lead to analysis and positioning and resolution of bugs during on-site delivery. On-site positioning and solving problems is not only inconvenient, but also consumes more time.
Document material is missing. is privatized, it is recommended to provide matching documentation engineers. Some materials are mandatory when the company is purchasing externally, otherwise the project will not be accepted. Moreover, if it is the first preparation every time a user has a document demand, let alone whether the quality is high or not, it will not feel good to the user, and it seems not professional enough.
number of 161b1c5a043c95 projects has increased year by year . The increase in the number of projects itself indicates that the product is selling well, but it will also increase the cost of operation and maintenance and after-sales. Manpower evaluation should be done at the right time, and the team and personnel should be expanded in time, otherwise it will bring more arduous tasks to the current employees and result in a greater turnover of personnel.
offline, and the later maintenance is difficult. Because the network is offline, the abnormal alarm information cannot be received in time, and the user needs to receive the alarm and feedback to the after-sales personnel. The problem location and repair process may not be smooth due to technical differences. Due to offline, some expected changes or upgrades require a business trip to the customer site, and the cost of support is relatively high.
User infrastructure impact. If the delivery is a software product, because the infrastructure such as machine, network, power supply, etc. are maintained by other departments of the user, abnormal infrastructure will affect the stable operation of the upper software.

Due to the above-mentioned reasons, the software has a long delivery cycle, poor delivery quality and high delivery cost when privatized delivery. If the cost is too high, the project may lose money due to too much investment.

In order to ensure the normal delivery of software privatization, the architecture design and technology selection should be combined with the current mainstream technology system to choose an appropriate solution.

Helm-based application packaging, delivery, upgrade and environmental maintenance

Software application delivery status

There are many ways to deliver and deploy software. According to the delivery method, it can be divided into three types: traditional basic delivery, automated delivery, and cloud-native delivery.

Traditional basic delivery. Traditional application delivery is the basic method of application delivery, such as commonly used rpm software packages or direct binary installation and operation, which is more suitable for infrastructure with relatively fixed scenarios. If there are more software package dependencies, usually some YUM or DEB software sources will be built to quickly install the dependencies. The operation of such services can generally be managed through systemd or supervisor. Automated delivery and cloud-native delivery are also built on this approach.

delivered automatically. If the number of software is large and the installation and deployment process has complicated logic, using manual rpm/yum installation will be cumbersome and error-prone. Usually encapsulated in an automated way, such as shell and Ansible are mainstream automation tools.

Cloud native delivery. Due to the rise of the concept of cloud native, coupled with the long-term influence of the early DevOps concept, the application has a new delivery model. In the cloud-native scenario, the use of a pipeline-based CI/CD (Continuous Integration and Continuous Delivery) model has become the new mainstream, providing a fast and efficient application iteration rhythm. Cloud-native delivery and operation environments are basically based on the Kubernetes platform, so there are also some tools that can directly manage and deploy applications, such as KubeVela. However, Helm is the only project that is actually used in the production environment of the enterprise and has been graduated from CNCF.

According to whether each system can be connected during delivery and deployment, it can be divided into online delivery and offline delivery.

delivered online. online delivery means that all or part of the materials are run on other servers during the delivery and deployment process, and these data can be obtained through the network during installation and deployment. For example, common Nginx-based YUM software sources, or corporate internal Harbor mirror repository, internal Gitlab code reference, or public network Maven repository, etc. The advantage of online delivery is that these resources do not need to be prepared in advance, and what resources are needed can be obtained on the corresponding server.

delivered offline. offline delivery and online delivery are just the opposite. The resources that are dependent on deployment cannot be obtained externally, and the dependent resources need to be prepared separately. If there is no YUM software source, build a temporary set locally or download offline software packages and dependent software packages and prepare them for installation together. In offline scenarios, there are many services that need to be built when deployed, which also adds a lot of difficulties to the delivery of the project.

Complex customer infrastructure brings complex delivery scenarios. If it is delivered based on traditional and automated methods, there will be a long list of adaptation support compatibility, such as hardware equipment, CPU architecture, operating system, etc. This will undoubtedly bring a lot of labor costs to the enterprise.

However, if you choose to deliver applications in a cloud-native way, you will have the opportunity to minimize human consumption in this different scenario.

Qingzhou application delivery tool selection

Delivery based on containerization. Faced with the diversity of user environments, packaging applications based on Docker containerization can shield the differences in the underlying infrastructure and make the delivered products universal.

Kubernetes-based operating platform. The environment such as container operation scheduling, fault detection and automatic processing, flexible expansion and contraction, and simple network environment is relatively weak, and Kubernetes just solves these pain points. Using Kubernetes as the container's operating environment can greatly reduce the complexity of delivery and operation and maintenance.

Helm-based application construction and delivery deployment. Helm can manage multiple different Kubernetes resources of the application and make it effective in the cluster according to certain strategies. At the same time, similar to rpm/deb, the application can also be standardized. The standardized application defined by Helm is Chart.

Helm version selection

Helm3 is preferred. In the case of an existing Kubernetes cluster, only one kubeconfig is needed to complete the delivery and deployment of the environment.

There are two versions of Helm, Helm2 and Helm3. If you are a novice, when you read this article, it is strongly recommended that you choose Helm3 directly; if you have already used Helm2, it is also recommended that you upgrade to Helm3 as soon as possible. Helm3 is not only much simpler in architecture, but also greatly optimized in terms of function, use and ease of use.

Helm3 was released on November 13, 2019. Twelve months after Helm3 was publicly released, support for Helm 2 officially ended.

Definition and standardization of cloud native applications

When using Helm Chart to define cloud-native applications, we should also hope that similar RPM/DEB software packages can define certain standards. Fortunately, Helm also supports it. Helm defines two standards, mandatory standards and recommended standards.

Mandatory standards mean that in the directory structure of Helm Chart, there are places where some file names are placed, built-in functions or variables, template rendering methods and resource priority must be strictly observed, otherwise Helm will not work properly. For example, values.yaml is the default parameter configuration file, and the templates directory is the resource template directory of Kubernetes.

Recommended standards refer to Chart standards defined by companies based on business requirements and management specifications on the basis of mandatory standards. For example, application naming conventions, Kubernetes resource file naming conventions, environment changed naming conventions, installation and upgrade logic specifications, etc.

Mandatory specifications must be complied with, otherwise Chart will not work properly. The recommended standards are managed through artificial constraints or defined scaffolding. The new application uses scaffolding to create hands, which is already a standard application.

1.定义变量：export XDG_DATA_HOME=/root/.helm
2.脚手架路径：/root/.helm/helm/starters/chartstarter
3.新建应用：helm create --starter chartstarter myapp

Even if some of the applied configurations and standards deviate in the subsequent iterations, the automated Check method can also be used to scan for violations of the standards.

Product management

Commonly used software package products in cloud-native scenarios include code source code, compiled or packaged executable files, built system software packages (optional, such as RPM packages), built images, and Helm Chart packages.

Minimize atomization. When making a Chart package, the number of business services contained in a Chart is determined according to the characteristics of the business scenario. Qingzhou's Helm Chart is based on the application code warehouse as the smallest unit, that is, there is only one functional program image in a Chart. If the business system is not very large and there are only a few applications, you can also consider using sub-charts to manage multiple services, such as the WordPress Helm Chart.

The version is traceable. After the minimum atomization, the GitRelease, mirroring Tag, and ChartAppVersion can be consistent with certain internal specifications, and the source code version can be quickly traced according to the environment Chart version during deployment or runtime.

For example:

1.Git Release：myapp  Rlease/v1.23.6
2.Image:   myapp:v1.23.6
3.Chart:    myapp-v1.23.6.tgz

The reuse and default of variables in Helm Chart

When installing Helm Chart, you can specify multiple values.yaml files, and the contents of values.yaml on the right side of the command line parameters will overwrite the contents of values.yaml on the left.

default values.yaml 161b1c5a043f0c Chart. values.yaml is only the configuration items that myapp's own Chart will use. The configuration items should be configured in values.yaml in the default way as far as possible, such as the default:

1.global.imagePullSecret
2.global.clusterDnsDomain
3.myapp.resources
4.myapp.mysql.dbname

values-global.yaml. All configuration items are supported, but not all configurations are required, such as environment difference configuration:

1.global.imageRegistry.addr（project,user,passwd)
2.global.mysql.host（port,user,passwd)

values-myapp.yaml. This field is defined in values-global.yaml, but the content needs to be adjusted for some reasons. For example, there are 10 applications that use MySQL, but 9 of them use a set of MySQL clusters, and another application uses a set of independent MySQL. In this case, you can define a differentiated Values-myapp.yaml for this application to cover. The MySQL field in values-global.yaml is fine. like

global.mysql.host（port,user,passwd）
For example, myapp-v1.23.6.tgz Chart installation command

helm install -n ns1  myapp  -f values-global.yaml -f values-myapp.yaml ./myapp-v1.23.6.tgz

The priority order of files during installation The default values.yaml <values-global.yaml <values-myapp.yaml in Chart

Installation and dependency management

In a single Helm Chart installation, you can use helm install to install, but if the number of Charts is very large, and manual installation is not appropriate when there are order dependencies, it is recommended to write an automated tool to manage and install it, which can be a shell script or Install the deployment platform.

The Sail system deployed by Qingzhou is a self-developed privatized delivery system that can provide functions such as environment deployment planning, configuration rendering, installation logic processing, deployment task execution, and unified management of environmental information.

Helm upgrade application

Whether software delivery is early automated delivery or Helm cloud-native application delivery today, application upgrades with structured data are not an easy task. Frequently, the risk of upgrading is increased because of the update of table fields involved during the upgrade.

In the cloud-native application delivery scenario, Kubernetes resource file declarative support generally does not require special processing. The difference between Kubernetes' rolling update and Helm's content update can be well resolved.

But for SQL processing, Helm is also powerless as a tool due to the logic of the business itself. But Helm provides us with the function of Hook, which allows me to insert certain tasks during a certain stage of installation or upgrade. For this task, the business side can decide how to deal with SQL.

One way of thinking about SQL processing is as follows:

The SQL file defines the logic according to the following rules, and Job uses the pre-upgrade hook as the upgrade basis:

Back up the database to be upgraded
Obtain the Chart version in the Release information in the running environment (obtained by an external tool)
Use .Chart.appVersion to indicate the target version
Get the missing version of the environment Release version and the target Chart version
Determine whether to meet the upgrade compatibility
Execute SQL import in sequence according to version
SQL maintenance method
Maintain the initial full SQL file, if it is larger than 1M, it will be split into multiple SQL files
The increments of each version are treated as independent SQL files, and the standard naming of SQL files is consistent with the version specifications. Such as v1.1.0.sql/v1.1.1.sql/v1.1.2.sql
Use ConfigMap to mount a large version of full and incremental SQL to the fixed directory of the Job container (for example: /data/upgrade/sql/)

Then you can use the following command to automatically upgrade to the target version when upgrading

$ helm upgrade -n ns1 -f values-global.yaml -f values-myapp.yaml   -set release.chartVersion=v1.23.1 myapp ./myapp-v1.23.7.tgz

Environmental Management

Under the Helm Chart-based cloud-native application delivery mode, the definition of the business environment can be simply understood as the Chart list and the Values.yaml file during deployment.

The Chart list defined in accordance with the specification can explain what products and components are deployed in the environment and what versions are used.

All the differential configurations in the current environment are saved in Values.yaml, combined with the default configuration in Chart, you can restore all the configuration information of the services running in the environment.

In the privatization scenario, environmental information maintenance is particularly important. When real-time access to the customer environment is not possible, the Chart list and Values file maintained during deployment can help quickly locate and solve problems after sales, and can quickly output upgrade materials during bug fixes or version upgrades.

Effects and benefits based on Helm's practice

Netease Shufan Qingzhou's current status of using Helm

Since the early Helm3 has not yet been released, the delivery of Qingzhou privatization is based on Ansible. After more and more projects are delivered, there are more and more scene differences, and the cost of compatibility and adaptation is also increasing. high.

After the release of Helm3, all services of Qingzhou have undergone a Helmization transformation, which has made great progress compared with the previous ones in the current state:

Release cycle of a single application version: 6 months ==> 1 month
Scenario-based adaptation: Unpredictable multiple OS system adaptation ==> Docker image can be adapted for a single x86/ARM
Project delivery efficiency: average 2 weeks/set ==> average 2~3 days/set
Delivery quality: The average number of delivery problems per set of environment is greater than 10 ==> less than 3

The delivered product capabilities cover cloud-native container cloud, microservices, service grid, CICD, API gateway, distributed transaction, APM, PaaS middleware and other products.

Based on this set of privatization delivery projects, a large number of privatization client projects inside and outside the company have been supported, and the smooth delivery of the projects has been ensured.

summary

When commercial privatization is delivered, there are many types of customer environment infrastructure and complex scenarios. Based on containers and Kubernetes, it can greatly reduce the workload of scenario adaptation and later operation and maintenance.

Helm is currently one of the best choices for packaging and delivery of cloud-native applications in offline scenarios.

Excellent Helm Chart practical experience requires case accumulation and can be passed on. It is recommended to pass on experience based on code.

author of Wenyu, NetEase Shufan Qingzhou delivery solution expert. Responsible for Netease Shufan Qingzhou project delivery and solution-related work, built the Qingzhou privatization delivery system from 0 to 1, providing customers with technical support from project planning to delivery. He has extensive experience in component high-availability solutions and application packaging and delivery in cloud-native scenarios, has a deep understanding of the Qingzhou ecological technology system, and is familiar with software privatization delivery models in cloud-native scenarios.

Delivery practice of large-scale cloud native applications in privatization scenarios

background introduction

Pain and difficulty of software privatization delivery

User-side pain and difficulty

delivery side pain

Engineering after-sales side pain and difficulty

Helm-based application packaging, delivery, upgrade and environmental maintenance

Qingzhou application delivery tool selection

SQL maintenance method

Effects and benefits based on Helm's practice

summary

网易数帆

引用和评论

一图看懂网易数帆指标平台EasyMetrics

70k star，取代Postman！这款轻量级API工具，太香了！

C++ 中 VS 项目引入公共配置文件

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

狂揽17k star！Docker可视化神器，一键部署项目真香！

OpenWebUI：一站式 AI 应用构建平台体验