Construction practice of pipeline engine of engineering efficiency CI/CD

After nearly 3 years of construction and polishing, the Meituan pipeline engine has completed the unification of the infrastructure on the server side, supporting nearly 100,000 pipeline executions per day, and the system success rate remains above 99.99%. This article mainly introduces the challenges and solutions encountered in the construction of self-developed engines.

1. Background

The concept of continuous delivery was first proposed at the Agile Conference in 2006. After years of development, it has become the only way for many technical teams to improve their R&D efficiency. Through the construction and deployment pipeline, the entire link from code development to function delivery is opened up, and a series of behaviors such as construction, testing, integration, and release are completed in an automated way, and finally the continuous and efficient delivery of value to users is realized.

The pipeline engine is used as the base to support the deployment pipeline, and its quality directly affects the level of the deployment pipeline construction. The common practice in the industry is to build through open source tools (or public cloud products) such as Jenkins and GitlabCI. This is a road that can help businesses quickly implement continuous delivery. In the early days, Meituan also adopted the method of building Jenkins to quickly support business.

However, as more and more businesses begin to build continuous delivery, the drawbacks of this "short-term and fast" approach are gradually emerging. For example, there is no unified standard for tool construction, and each business needs to understand the details of the entire tool chain. The construction cost is high and the level is uneven. Few businesses can build a complete deployment pipeline. At the same time, the daily construction volume of the business is growing rapidly, gradually exceeding the limit that open source tools such as Jenkins can bear. During the peak delivery period, tasks are seriously queued and services are frequently unavailable, which seriously affects the smoothness of business delivery.

Meituan has gone through several stages in the construction of pipeline engines. Before 2019, it was mainly optimized around Jenkins. In 2019, a project was officially established to build a self-developed pipeline engine. The general process is as follows:

The first stage (2014-2015) : Build a Jenkins unified cluster to solve common problems of business access (such as single sign-on, code warehouse integration, message notification, dynamic scaling of execution machines, etc.), and reduce business construction costs.
The second stage (2016-2018) : Split multiple Jenkins clusters to solve the performance bottleneck of a single cluster caused by business growth. At most, there are more than a dozen clusters. These clusters are usually divided according to the business line dimension and built by the business itself. However, with the passage of time, it becomes more and more difficult to split and manage the cluster, and Jenkins has frequent security risks, which has caused a great operation and maintenance burden on the platform side.
The third stage (2019-present) : In order to completely solve the problem of single engine bottleneck and repeated tool construction, we began to develop a distributed pipeline engine (Meituan's internal project name is Pipeline), and gradually converged the underlying infrastructure that each business relies on.

After about 3 years of construction and polishing, the pipeline engine has completed the unification of server-side infrastructure, covering almost all businesses such as in-store, Dao-home, Dianping, Meituan Select, Meituan platform, automatic delivery vehicles, basic R&D platforms , etc. It supports Java, C++, NodeJS, Golang and other languages. In terms of performance and stability, the engine supports nearly 100,000 pipeline executions per day (the peak of job scheduling reaches tens of thousands per hour), and the system success rate remains above 99.99% (excluding the business code itself and third-party tools). question).

Below we mainly introduce the challenges we encountered in the construction of our self-developed engine and the corresponding solutions.

2. Questions and ideas

2.1 Business Introduction

1) What is pipeline

We can think of the execution of the pipeline as the process of processing the code step by step and finally delivering it to the line. According to the sequence relationship defined by the business, the corresponding processing or quality verification behaviors (such as construction, code scanning, interface testing, deployment tools, etc.) are executed in sequence, and the entire execution process is similar to a directed acyclic graph.

图1 流水线概念

2) Basic Concepts

Component : For the consideration of code reuse and business sharing, we encapsulate the operation behavior of a tool into a component, which represents a specific processing or verification behavior. Through the component method, the business can easily use the integrated quality tools (such as static code scanning, security vulnerability analysis, etc.) to reduce the repeated development cost on the same tool; for scenarios that do not meet the requirements, the business can customize a new one components.
Component Job : Represents a running instance of a component.
Resource : An executable environment assigned to a component job.
Pipeline Orchestration : Indicates the order in which different components in the pipeline are executed.
Engine : Responsible for scheduling all component jobs, assigning corresponding execution resources to them, and ensuring that pipeline execution is completed as expected.

2.2 Main challenges

1) Scheduling efficiency bottleneck

It is relatively sensitive to scheduling time. Most of the pipelines are short-term jobs (jobs last from tens of seconds to minutes). If the scheduling time is too long, the business can obviously perceive that the pipeline execution is slow. We need to ensure that the job scheduling time is within a controllable range to avoid scheduling bottlenecks.

Considering the business scenario, the scheduling logic has certain business complexity (such as component serial-parallel judgment, priority preemption, downgrade skip, reuse of the previous result, etc.), not only the matching calculation between jobs and resources, but also the time-consuming job scheduling. There is a certain business overhead.
The engine supports the company's execution volume of nearly 100,000 times per day. In the case of peak volume, the workload of concurrent scheduling is large. Common open source tools (Jenkins/GitLab CI/Tekton, etc.) use the single scheduling mode, and the jobs are serially scheduled. Yes, scheduling bottlenecks are prone to occur.

2) Resource allocation problem

For an operating system, the number of jobs is usually greater than the number of resources (in a real deployment situation, resources are not unlimited), and the backlog of jobs is a problem that must be considered in system design. How to maximize the throughput of jobs with limited resources, while reducing the impact on core business scenarios when resources are insufficient.

If you only rely on dynamic expansion, it is prone to situations where expansion cannot be performed when resources are insufficient, and jobs are queued. Especially for the business that relies on the assembly line for R&D card control, this will directly block the online process of the business.
Considering the time-consuming implementation, most resources are pre-deployed to shorten the preparation time for resource application and application startup. As for the pre-deployed resources, how to effectively divide the resources not only ensures that each type of resources has a certain quota, but also avoids the low utilization rate of some resources, which affects the overall throughput of the job.
Not all tool execution resources are managed by the engine (such as the release system, the resource management of deployment tasks is separate), and different resource management methods need to be considered in the resource allocation of jobs.

3) The problem of tool differentiation

Different businesses in the company are highly differentiated, and many quality and efficiency tools are involved. How to design a suitable plug-in architecture to meet the access requirements of different tools.

Different tools have different implementation forms. Some tools have independent platforms and can be integrated through interfaces, and some are just a piece of code, and a corresponding operating environment needs to be provided. In the face of different access forms, how does the engine shield the differences caused by different tools, so that the business does not need to pay attention to the implementation details of the tools when arranging the pipeline.
With the continuous enrichment of business scenarios, component execution will also involve human interaction (approval scenarios), support for retry, asynchronous processing, fault recovery and other capabilities. How to expand these capabilities to minimize the impact on the system and reduce the complexity of implementation .

2.3 Solutions

1) Split scheduling decisions and resource allocation to solve scheduling efficiency bottlenecks

From the above analysis, the actual scheduling time of a job = the scheduling time of a single job * the number of jobs to be scheduled. Because the scheduling time of a single job is affected by specific business logic, the uncertainty is large, and the optimization space is limited. The serial scheduling problem is relatively clear, and it is a suitable optimization direction when the time and quantity of job scheduling are uncontrollable.

Regarding serial scheduling, a common practice in the industry is to split multiple clusters according to the business line dimension to share the total scheduling pressure. However, the problem with this method is that resource allocation is not flexible, and uneven distribution of resources is prone to occur. When the overall resources are insufficient, the resource allocation of high-quality jobs cannot be considered from a global perspective. In addition, multi-cluster management (adding new clusters/splitting existing clusters) is also a big O&M burden.

Further analysis, serial scheduling is mainly to avoid the problem of resource competition and obtain relatively optimal resources. For the pipeline scenario (the amount of work is greater than the amount of resources and all are short-term jobs), the optimal solution of resources is not a strong demand. In addition, the concurrency of resources is more controllable than the amount of jobs. According to the speed of job execution, we control the number and frequency of pulling jobs by actively pulling jobs, thereby effectively reducing resource competition.

In the end, we adopted a model that separates scheduling decisions from resource allocation in our design:

Scheduling decisions : Responsible for calculating jobs that can be scheduled, submitting decisions, and waiting for suitable resources to execute. This module expands horizontally to share the pressure of scheduling decisions.
Resource allocation : Responsible for maintaining the relationship between jobs and resources. By actively pulling jobs, resources can pull jobs from any instance, eliminating the original single-point restriction on serially allocating resources.

In this mode, job scheduling and resource allocation have the ability to scale horizontally, with higher performance and system availability. It is also beneficial for the logic of job scheduling to evolve independently, which is convenient for development, testing, and grayscale launch.

2) Introduce resource pool management mode to realize flexible allocation of resources

Considering that not all resources are managed by the engine, we introduce the concept of resource pools to shield the differences in different resource methods. Each resource pool represents a collection of resources, and the resource management methods of different resource pools can be diversified. In this way, we simplify the problem of resource allocation into the matching problem between jobs and resource pools. According to the actual situation of the jobs, we can reasonably set different resource pool sizes, and dynamically adjust the resource pools with monitoring methods.

In terms of specific measures, we choose the "label" method to establish the matching relationship between jobs and resource pools, and satisfy the above conditions from the two dimensions of jobs and resources.

On the job side, jobs are split into different job queues based on tag attributes, and the concept of priority is introduced to ensure that jobs in each queue are pulled according to their priority levels, so as to avoid high-quality jobs in the backlog and cannot be queued. Timely processing, blocking the business development process.
On the resource side, combined with the actual scenarios of resources, three different resource pool management methods are provided to solve the problem of quota and utilization of different resource types.
- The preset public resources, which will be expanded in the resource pool in advance, mainly deal with time-sensitive component operations that are frequently used by the business. In terms of resource quota and utilization, dynamically adjust the size of different resource pools according to the historical situation and real-time monitoring of resource pools.
- The resources used on demand are mainly aimed at the situation that the public resource environment is not satisfied. The business needs to customize the resource environment. Considering that the volume of this part of the job is not large, the real-time expansion method is directly adopted. Compared with the preset resource method, it can be Get better resource utilization.
- The resources of external platforms. The management platform of these resources is more experienced than us. The platform manages the throughput of jobs by controlling the frequency and number of jobs pulled from the engine.

3) Introduce the hierarchical design of components to meet the differentiated requirements of tools

In order to maintain the freedom of tool access, the engine provides the most basic operation interface in the job dimension (pulling jobs, querying job status, and reporting job results). Different tools can implement customized component development according to the job interface form.

Component development mainly involves two parts: ① implementing business logic and ② determining the delivery method, while the system interaction with the engine is relatively standard. We carry out hierarchical design according to the component execution process, and split into three layers: business logic, system interaction and execution resources. While shielding the tool implementation details from the engine, it can better meet diverse access scenarios.

The system interaction layer, which is transparent to component developers, formulates a unified process interaction standard according to the interface provided by the engine to shield the implementation differences of different components from the engine.
The execution resource layer mainly solves the difference in the operation mode of the tool, and satisfies the different integration methods of the tool and the engine by supporting a variety of component delivery forms (such as mirroring, plug-in installation, and independent services).
The business logic layer adopts a variety of adapter options for different business development scenarios to meet different business development demands.

3. Overall Architecture

图2 流水线架构

Trigger : As the trigger entry of the pipeline, it manages various trigger sources and trigger rules (Pull Request, Git Push, API trigger, timing trigger, etc.).
Task Center : Manage running instances in the pipeline construction process, and provide operations such as pipeline running, aborting, retrying, and component job result reporting.
Decision maker : Make decisions on all jobs waiting to be scheduled, and synchronize the decision results to the task center, and the task center will change the job status.
Worker : Responsible for pulling executable jobs from the task center and assigning specific execution resources to the jobs.
Component SDK : As the shell that executes the business logic of the component, it is responsible for actually mobilizing the component and completing the system interaction of component initialization and state synchronization.

4. Core Design Points

4.1 Job Scheduling Design

1) Scheduling process

Below, we use a simple pipeline scheduling example (source checkout - [parallel: code scan, build] - deployment) to introduce the collaborative process of each module in the scheduling design.

图3 调度过程

The general logic is as follows:

When the pipeline build is triggered, the system will create all the component jobs to be executed by the orchestration in the task center . And the change of the job status is notified to the decision maker in an event mode for decision-making.
The decision maker receives the decision event, calculates the jobs that can be scheduled according to the decision algorithm, and submits the job status change request to the task center .
The task center receives the decision request, completes the job status change (the job status changes to a decision), and joins the corresponding waiting queue at the same time.
Worker pulls the job in the waiting queue that matches itself through long polling, starts to execute the job, and reports the result to the task center after the execution is complete.
The task center changes the job status according to the job execution result reported by the worker, and at the same time initiates the next round of decision-making to the decision maker .
This is repeated until all jobs under the pipeline have been executed or a job failure occurs, and the final decision is made on the pipeline to end this execution.

During the whole process, the task center, as a distributed storage service, maintains the status information of pipelines and jobs in a unified manner, and interacts with other modules in the form of API. The decision maker and Worker execute corresponding logic by monitoring changes in job status.

2) Job status flow

The following is a complete state machine of a job. We report a series of events through job decision, pull, ACK, and result reporting, and finally complete the flow of the job from the initial state to the end state.

After receiving a certain state transition event (Event), the state machine transfers the current state to the next state (Transition) and executes the corresponding transition action (Action).

图4 状态机

In actual scenarios, since the scheduling process involves a long link and the stability of each link cannot be fully guaranteed, it is easy to cause the situation that the state does not flow due to abnormal conditions. To this end, the database is used in the design to ensure the correctness of the state change, and a corresponding compensation mechanism is established for the non-completed state jobs to ensure that the job can resume the correct flow after any link is abnormal.

We focus on the two key processes of job decision-making and job pulling to see the possible problems in the state flow process and how to solve them in design.

Job decision-making process : The task center receives the decision to schedule the job, changes the schedulable job from unstart to the pending state, and adds the job to the waiting queue, waiting to be pulled.

图5 状态机-决策

Decision event not received : Due to the problem of the decision maker service itself or network reasons, the request for the decision event fails, and the job is in the unscheduled state for a long time.

Solution: Introduce the mechanism of regular monitoring to re-decision the pipeline that has no process status and is in the unfinished state, so as to avoid short-term abnormality of the decision-making service leading to decision-making failure.

Repeated decision-making : Due to network delay and message retry, multiple decision-makers may make decisions on the same job at the same time, resulting in the concurrency problem of job transfer.

Solution: Increase the pending status to indicate that the job has been decided, and change the status through the database optimistic locking mechanism to ensure that only one decision will actually take effect.

Abnormal state change process : Due to the existence of heterogeneous databases, there may be data inconsistencies between state changes and joining queues, resulting in jobs that cannot be scheduled normally.

Solution: Adopt an eventual consistency scheme that allows short delays in scheduling. The operation sequence of changing the database first, and then adding to the queue is adopted. The compensation mechanism is used to regularly monitor the job information of the queue head. If there are jobs in the pending state that are earlier than the job at the head of the queue, the re-queue operation is performed.

Job pulling process : The task center obtains the job to be scheduled from the waiting queue according to the event request of the worker pulling the job, changes the status of the job from pending to scheduled, and returns it to the worker.

图6 状态机-ACK

Job loss problem : There are two situations: ① the job is removed from the queue, but abnormal when the status is about to change; ② the job is removed from the queue and the status is changed correctly. However, because the poll request connection timed out, it was not returned to the worker normally.

Solution: The former rejoins the queue through the job compensation mechanism for the pending state in the job decision-making process; the latter adds an ACK mechanism to the scheduled job when the state has changed. If the timeout is not confirmed, the state will flow back to the pending state. Waiting to be re-pulled.

The job is pulled by multiple workers : After the worker receives the job, it encounters a long GC, which causes the state flow to return to the pending state. After the worker recovers, the job may be assigned to another worker.

Solution: Use the database optimistic locking mechanism to ensure that only one worker is updated successfully, and record the relationship between the job and the worker, which is convenient for aborting the job and recovering after the worker fails.

3) Decision-making process

The decision-making process is to filter out the jobs that can be scheduled from all the unstarted jobs, submit them to the task center in a certain order, and wait for them to be pulled by resources. The entire screening process can be divided into three parts: serial and parallel sequence, conditional filtering, and priority setting.

图7 决策过程

Serial-parallel sequence : Compared with the complex pathfinding scenario in DAG, the pipeline scenario is relatively clear. It is a process of gradually processing and verifying the code and going through a series of stages such as development, testing, integration, and launch. The stages are strictly serial, and for the sake of execution efficiency, there will be serial and parallel execution within the stage. Here, through model design, the DAG scheduling problem is transformed into a job sequence problem, the concept of run order is introduced, a specific execution order is set for each component job, and the next batch of orders is quickly screened according to the order of the currently executed jobs. It is only larger than the current job. If it is executed in parallel, it is only necessary to set the order of the jobs to be the same.

图8 串并行决策

Conditional filtering : With the expansion of business scenarios, not all jobs need to schedule resources for real execution. For example, a certain type of time-consuming component can directly reuse the last execution result when the code and component parameters remain unchanged, or perform a component skipping downgrade operation at the system level when a certain type of tool is abnormal. For this kind of situation, before the job is actually submitted to the task center, a layer of conditional judgment will be added (conditions are divided into globally set system conditions and user conditions). These conditions are matched and filtered in sequence in the form of a chain of responsibility. Submit decisions individually to the Mission Center.
Priority setting : Considering the overall system, when there is a backlog of jobs, the business is more concerned about whether the entire pipeline can be executed and completed as soon as possible in the core scenario, rather than the queuing of a single job. Therefore, in the priority setting, in addition to the relative fairness strategy based on the timestamp , the weight value of the pipeline type (such as release pipeline > self-test pipeline; manual trigger > timing execution) is introduced to ensure that the core scene pipeline related jobs can be scheduled as soon as possible. .

4.2 Resource pool division design

1) Overall plan

We adopt a multi-queue design, and combine tags to establish a matching relationship between job queues and resource pools to ensure the effective division of resources in different queues. In the event of queue backlogs, resource pool failures, and inability to expand resources, the impact is minimized. Scope to avoid the phenomenon that all jobs are queued globally.

图9 资源池架构

2) Model relationship

图10 资源池模型对象

The relationship between job queues and labels : Queues and labels adopt a one-to-one relationship to reduce business understanding and operation and maintenance costs.

When the queue is backlogged, it can quickly locate a tag that has no resources.
When the label resource is insufficient, it can also quickly determine the specific queue situation affected.

The relationship between tags and resource pools: Tags and resource pools adopt a many-to-many relationship, mainly considering the overall utilization of resources and the guarantee of resource availability for core queues.

For some queues with a small amount of work, allocating a resource pool alone will cause the resources to be idle most of the time and the resource utilization rate will be low. By adding multiple tags to the resource pool, we not only ensure that the queue has a certain resource quota, but also can process jobs with other tags and improve the utilization of resources.
For queues in core scenarios, tag resources are usually allocated to multiple resource pools to ensure a certain redundancy of resources and reduce the impact of an overall failure of a single resource pool.

3) Label Design

The purpose of tags is to establish a matching relationship between resources (pools) and jobs (queues). In terms of design, in order to facilitate label management and post-maintenance, we use the form of two-dimensional labels to jointly determine the label and corresponding resources of a job through the two dimensions of components and pipeline.

The first dimension : the component dimension, which generally divides resources. Based on the business coverage of components, job execution volume, and special requirements for machines and environments (such as SSD, Dev environment, etc.), mark components that require independent resources and divide them into different public resource pools (each public resource pool). Execute one or more types of component jobs) and allocate them uniformly at the engine level to ensure that all jobs can run normally.
The second dimension : the pipeline dimension, which is divided according to business scenarios. According to the demands of the business on resource isolation/job backlog sensitivity, the division is made as needed. Some services that want completely independent resources will be divided from all public resource pools; some only need to guarantee resources in some core scenarios, and are selectively divided from some public resource pools according to the components involved in the link. Achieve a balance between business isolation and resource utilization.

Note: For each dimension, a default value of other will be set for the bottom line, which is used to deal with scenarios without resource division requirements.

图11 标签设计

4) Queue split design

Split into multiple queues according to different labels to which jobs belong to ensure the independence of each queue and reduce the impact of job backlogs. The whole splitting process can be divided into two parts: enqueue and dequeue:

Enqueue process : Determine the label corresponding to the job by calculating the attribute values of the job in the two dimensions of the component and the pipeline. Combined with the relationship between the tag and the queue (1-to-1) in the model relationship, a queue is created for each tag as needed, the tag job is stored, and the jobs between different queues are processed exclusively to simplify the implementation complexity of dequeuing.
Dequeue process : After the queue is split, because of the relationship between tags and resource pools (many-to-many), a job pull request from the resource pool often involves multiple queues. For the sake of pulling efficiency, the polling method is used to dequeue a single queue in turn, until the upper limit of the number of jobs requested for this time is reached or the result is returned when all optional queues are empty. This method can avoid locking multiple queues at the same time , and randomly sorts multiple tags in the front-end link, reducing the competition probability of multiple requests operating a queue at the same time.

图12 队列拉取设计

4.3 Component Hierarchical Design

1) Layered Architecture

图13 组件架构设计

Business layer : The adaptation layer is introduced to meet the diverse demand scenarios in component development, and at the same time avoid the pollution of the upper layer to the lower layer.
System interaction layer : establish a unified process standard to ensure the consistency of the engine and component interaction process, and facilitate the unified processing of non-functional system optimization.
Execution resource layer : Provides a variety of resource strategies to shield the differences between different resource types to the upper layer.

2) Standard interaction process design

In the system interaction layer, in the process of interaction between components and engines, there are two links that are determined. ① The state machine flow of component jobs involves the entire life cycle management of component execution. If different state flow relationships are allowed, the entire The management process will be very confusing; ② The scope of the interface provided by the engine externally, from the perspective of decoupling between services, the externally provided interface is mainly the interface operation of the component job dimension, and should not be coupled to any internal implementation details of the component.

Combined with the interface provided by the job state machine + engine, the basic system interaction process of the component execution is determined.利用模版模式，抽象出init() 、 run() 、 queryResult() 、 uploadArtifacts()等必要方法供业务实现，整个交互流程则由系统Unified processing, business does not need to care.

图14 组件标准流程设计

3) Expand basic capabilities

In addition to the normal execution process, component execution also involves operations such as component suspension and callback (manual approval scenarios) with the enrichment of business scenarios. The introduction of these operations will inevitably change the original interaction process. In order not to increase additional interaction complexity, in the pull job link, increase the event type of the job (events such as run, abort, callback, etc.), and the Worker executes the corresponding extension logic according to the different events pulled. At the same time, the introduction of new extensions will not affect the existing interaction flow.

图15 组件扩展能力设计

Based on the above extensions, we may better sink some general capabilities to the Daemon Thread layer. For example, in the result query process, the original query limit of synchronous waiting is cancelled by the way of daemon thread, which can release resources in advance for scenarios that require asynchronous processing (such as the component job logic has been executed, only waiting for the external platform interface to return the result) , to improve the utilization of resource execution. And, when the execution resource fails to restart, the result query thread will automatically resume pending asynchronous jobs. The support of this part of the capability is transparent at the business layer and does not change the entire interaction process.

4) Introduce adapter

Although the business can complete custom components through necessary methods, these methods are too basic, and the business implementation cost is high in some specific scenarios. For example, for the scripted invocation of Shell, the business only needs to provide an executable Shell, and the implementation of other necessary methods can be completed by the system.

For business personalized processing, the adapter mode is adopted, and different Commands (ShellCommand, xxCommand) are generally introduced to implement the necessary methods in specific scenarios by default, reducing business development costs. At the same time, the consistency of the system-side process is maintained, and the coupling of business personalized processing is prevented by dynamically injecting Command .

图16 组件适配器设计

5) Effects

At present, multiple access methods such as shell components, service components, and container components are supported. Hundreds of components have been provided on the platform, and the component developers involve dozens of business lines . The component library covers the source code domain, construction domain, test domain, deployment domain, manual approval domain and other links, opening up all the basic tools involved in the research and development process.

图17 组件库

5. Follow-up planning

With the help of cloud-native technologies such as Serverless, explore lighter and more efficient resource management solutions, provide more refined resource strategies, and provide businesses with better resource hosting capabilities from three aspects: resource elasticity, startup acceleration, and environment isolation.
For component developers, provide a one-stop development management platform from development, launch to operation, reduce component development and operation costs, enable more tool parties and individual developers to participate, and jointly create a variety of business scenarios and form a benign Component operation ecology.

6. The author of this article

Geng Jie, Chunhui, Zhiyuan, etc., are from the R&D platform team of the R&D Quality and Efficiency Department.

Job Offers

The R&D Quality and Efficiency Department of Meituan is responsible for the construction of platforms and tools in the company's R&D efficiency field (including R&D demand management tools, CI/CD pipelines, distributed code warehouses, multi-language construction tools, release platforms, test environment management platforms, and full chain Road stress testing platform, etc.), is committed to continuously promoting excellent research and development concepts and engineering practices, and building first-class engineering infrastructure. We have been recruiting senior and senior technical experts for a long time, Base Beijing and Shanghai. Interested students can send their resumes to gengjie02@meituan.com (mail subject: Meituan R&D Quality and Efficiency Department).

Read more collections of technical articles from the Meituan technical team

| Reply keywords such as [2021 stock], [2020 stock], [2019 stock], [2018 stock], [2017 stock] in the public account menu bar dialog box, you can view the collection of technical articles by the Meituan technical team over the years.

| This article is produced by Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "The content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activities, please send an email to tech@meituan.com to apply for authorization.

Construction practice of pipeline engine of engineering efficiency CI/CD

1. Background

2. Questions and ideas

2.1 Business Introduction

2.2 Main challenges

2.3 Solutions

3. Overall Architecture

4. Core Design Points

4.1 Job Scheduling Design

4.2 Resource pool division design

4.3 Component Hierarchical Design

5. Follow-up planning

6. The author of this article

Job Offers

美团技术团队

引用和评论

可信实验白皮书系列04：随机轮转实验

C++ 中 VS 项目引入公共配置文件

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

狂揽17k star！Docker可视化神器，一键部署项目真香！

Spring 数据校验：@Validated 与@Valid 注解全面对比与应用

OpenWebUI：一站式 AI 应用构建平台体验