In the era of deep learning, the demand and consumption of computing power are increasing. How to reduce the cost of computing power and improve the efficiency of computing power has gradually become an important new topic. Intelligent computing power aims to perform refined and personalized distribution of computing power to achieve optimal resource utilization. This article mainly shares the experience accumulated by Meituan takeaway advertising in the process of intelligent computing power exploration and practice, hoping to bring some help or inspiration to everyone.
1. Business background
At present, Meituan’s daily order volume has exceeded 40 million, making it one of Meituan’s most important businesses. The takeaway advertising service has also developed from a single business line at the beginning to more than a dozen business lines now. The traffic that the service bears as a whole is increasing day by day, and the machine resources consumed have reached a certain scale.
In the take-out scenario, the traffic presents an obvious double-peak structure, that is, the two periods of lunch and dinner are traffic peaks, and the traffic is small in the rest of the period. Under such traffic characteristics, take-out advertising services face greater performance pressure during peak hours, and there is a large amount of computing power redundancy during off-peak hours. From a global perspective, the efficiency of machine computing power allocation is low, and the traffic value still has a large value. Tap the space. On the one hand, the computing power consumed by the traffic is not dynamically allocated according to the value of the traffic, resulting in insufficient computing power allocation in the system for high-value traffic, and the value is not fully tapped, while a large amount of computing power is wasted on low-value traffic. Phenomenon; on the other hand, during off-peak hours, the system traffic is low, resulting in low overall system resource utilization, which limits the system to obtain higher business benefits.
Therefore, computing power needs to be allocated more reasonably to get more efficient use. At present, there is little research on dynamic computing power allocation in the industry. The DCAF [1] of Alibaba's targeted advertising platform is represented. This solution performs differentiated computing power allocation based on the value of traffic, and assigns different candidates to traffic of different values. Queue length, to maximize revenue under limited resource constraints. DCAF provides an excellent solution, but this solution has certain limitations in the takeaway advertising scenario.
Aiming at the takeaway advertising scenario, the takeaway advertising technology team conducted a series of explorations and improvements on the basis of the DCAF solution, and for the first time tried a combination of flexible queue allocation and model flexible allocation, and achieved good returns. On the one hand, when machine resources are flat, CPM can increase by 2.3%; on the other hand, when business income is flat, machine resources can be reduced by 40%. Finally, we have fully promoted machine resources in the stage of fine-tuning takeaway listing ads. Flat plan.
2. Overall thinking
In the takeaway advertising engine, in order to cope with the extreme pressure of online traffic and the huge candidate set, we designed the entire retrieval process into a funnel-shaped cascade structure with the candidate set decreasing in turn, mainly including recall, rough ranking, fine ranking, and mechanism And other modules.
The overall idea of realizing intelligent computing power is to perform differentiated computing power allocation to different value flows under the constraints of system computing power, so as to improve the efficiency of computing power allocation in the advertising retrieval process and maximize revenue. Intelligent computing power mainly includes the following four elements:
1. Quantification of traffic value : Traffic value refers to the benefits that traffic brings to the platform, advertisers, and users. The system needs to have the ability to quantify the value of traffic.
2. Traffic calculation power : Traffic calculation power refers to the machine resources consumed by traffic in the system. In the take-out advertising scenario, the calculation power consumed by the traffic and the size of the candidate set, the number of recall channels, the model size, the complexity of the link, etc. System variables are closely related, and the same system needs to have the ability to quantify the flow of computing power.
3. Quantification of system computing power capacity : System computing power capacity refers to the total amount of machine resources of the system, which is consistent with the dimension of flow computing power. Usually, the system computing power capacity can be obtained through pressure measurement and other means; in the system computing power allocation In the process, it is necessary to ensure that the overall flow computing power consumption does not exceed the system's computing power capacity.
4. Intelligent computing power distribution : Based on the above three elements, intelligent computing power distribution is performed on the entire link of the advertising engine. We define the computing power distribution method as " elastic action ". In the takeaway advertising scenario, We mainly summarized the following four actions:
- elastic queue : Online retrieval is a funnel process, and different value flows can be allocated different candidate queue lengths in each module of the cascade funnel.
- elastic model : In the model estimation service, different value flows can be assigned to different models. The larger model is better than the smaller model, and it consumes more computing power.
- elastic channel : In a multi-channel recall, different value flows can be assigned to different recall channels.
elastic link : On the retrieval link, different value traffic can be allocated to retrieval links of different complexity.
The optional range of these elastic actions is defined as " elastic gear ", for example, the queue lengths of 100 and 200 correspond to two different gears of the elastic queue. Under the intelligent computing power, the computing power distribution process is the intelligent decision-making process of elastic action and elastic gear .
Challenge analysis
In order to let the intelligent computing power fall in the takeaway advertising scene, we mainly face the following challenges:
problem solving
- Challenge point : The goal of intelligent computing power is to optimize the allocation of computing power, which requires us to solve the problem of "maximizing flow revenue under the constraints of system computing power".
- Coping Idea : Refer to the existing solution, and disassemble the problem into three sub-problems: flow value estimation, flow calculation power estimation, and calculation power allocation, and explore and improve the takeaway advertising scenario.
System stability guarantee
- Challenge point : Host the system's computing power distribution to the smart computing power framework, from equal computing power distribution to smart computing power distribution, not only to ensure the stability of the smart computing power framework itself, but also to ensure the smooth operation of the system's entire link .
- Response Idea : In addition to conventional safeguards such as monitoring alarms and fuse degradation, we have implemented real-time control functions based on system status to ensure system stability.
Universality & Extensibility
- Challenge point : Taking into account the multiplexing of basic capabilities and the expansion of personalization capabilities, it supports the two major directions of takeaway advertisement recommendation and search, and access to multiple business scenarios.
- Response Idea : The core components provide reusable and scalable capabilities in the form of SDK, and are based on common value evaluation indicators, computing power evaluation indicators, and intelligent computing power frameworks to support the combined decision of multiple flexible actions and the efficiency of multiple business scenarios Access.
3. Scheme design
After in-depth Co-Design by the engineering team and algorithm team, we designed a set of intelligent computing power framework for multi-action combination decision-making. The entire framework is composed of decision-making components, collection components and control components. The decision-making component is the core of the intelligent computing power framework and is embedded in application services in the form of SDK to provide a reusable and extensible multi-action combination optimal gear decision-making ability and system stability guarantee capability all stages of the advertising engine; collection components and control components provide support for system stability guarantee. The following mainly introduces the two modules of optimal gear decision and system stability guarantee in detail.
3.1 Optimal gear decision
Based on the existing flexible queue solution solutions in the industry, we have carried out a series of explorations and improvements:
- By selecting a more general flow calculation power evaluation index, and adding a flow calculation power estimation module, it can ensure that the quantitative indicators are universal while improving the accuracy, and solve the problem of calculation power is not universal and inaccurate .
- By building special gear, the first attempt combination of elasticity and resilience queue model , solve part of the flow can not be modeled problem.
Based on the above strategy, we have realized the optimal gear decision of multiple flexible action combinations.
3.1.1 Problem modeling
Existing plan
DCAF [1] this problem into a corresponding dual problem to solve, obtain a decision formula, and realize flexible queue allocation.
program improvement
The above modeling scheme has the following problems in the takeaway advertising scene:
As shown in the figure, since the computing power and value of the gear $j=j_0$ are known, there is no need to estimate the value and computing power of different models. The subsequent flow value estimation and flow computing power estimation work are all oriented towards flexibility Just queue.
3.1.2 Decision Framework
As shown in the figure, the optimal gear decision module is divided into two stages, offline and online, and includes the following four sub-modules:
- traffic value estimation module (offline + online): Estimate the value of traffic in different gears.
- flow calculation power estimation module (offline + online): Estimate the calculation power of the flow in different gears.
- offline λ solving module (offline): By replaying historical traffic, the binary search algorithm is used to solve the optimal λ.
- online decision module (online): For online traffic, calculate the optimal gear based on the gear decision formula, and assign different models and queue lengths according to the calculation results.
3.1.3 Traffic value estimation
Traffic value estimation is the core of intelligent computing power decision-making and requires certain accuracy. Online model estimation will increase the time consumption of the retrieval link. We adopt the offline XGB model estimation + online lookup vocabulary solution, which not only guarantees the accuracy of the estimation, but is also light enough.
value evaluation index select : Generally speaking, the value of traffic refers to the revenue that the current traffic brings to the advertising platform; in the takeaway advertising scenario, we pay attention to platform revenue while also paying attention to the revenue of merchants, so our traffic estimates are The indicator is selected as $k_1 * platform revenue + k_2 * merchant revenue $.
As shown in the figure, the traffic value estimation module includes two stages, offline and online.
offline phase :
- feature screening & bucketing : based on offline feature importance analysis and distribution, feature screening and bucketing.
model training
- Main problem: In the initial stage, we adopted a statistical scheme. In the case of more feature bins, the data sparse problem is serious, and the larger the queue length, the sparser the data.
- Solution: Use the XGB model to replace the statistical scheme to enhance the generalization ability.
- bucket value storage : The value estimation results of different characteristics buckets are written into the vocabulary in the KV structure.
online phase :
- feature extraction & analysis : Perform feature extraction and analysis, and generate Key values according to offline bucketing rules.
- original value estimation : Find the value of the corresponding bucket according to the Key value, and then calculate the value of the request under the original queue length through linear interpolation.
- gear value estimation : As shown in the figure below, with the help of rough scoring, the value of different gears can be calculated by calculating the attenuation of different gears to realize the value estimation of different gears.
3.1.4 Estimation of flow calculation power
The implementation of intelligent computing power in the industry is mainly based on flexible queues. Generally, the queue length is used as the flow computing power evaluation index, and the queue length as the flow computing power evaluation index faces the following two problems:
- Universality problem : In the elastic model, elastic channel and elastic link, the traffic computing power consumption is not uniquely determined by the queue length. For example, traffic from different sources may follow different models (the computing power consumption of different models may be different).
- accuracy problem : In the take-out advertising scenario, even for flexible queue actions, the flow calculation power consumption and queue length are not a simple linear relationship.
computing power evaluation index selects : In order to solve the above problems, we use the CPU time consumed by the traffic as the traffic computing power evaluation index.
As shown in the figure, the flow calculation power estimation includes two stages: offline and online.
offline phase :
- feature screening & bucketing : based on offline feature importance analysis and distribution, feature screening and bucketing.
model training
Training process: First divide the sample into different feature buckets (the queue length in the same bucket is different, and other features are the same), and then fit the relationship between the computing power and the queue length for different buckets.
- The main problem: due to uneven data distribution, after the queue length is greater than a certain threshold, due to sparse data, the calculation power statistics in the buckets begin to fluctuate, which is not conducive to online decision-making.
- In order to solve the problem of data sparseness and fit the unpacking phenomenon in the real business, we used the piecewise linear fitting scheme to fit the relationship between the computing power and the queue length as a piecewise linear function ( The figure below is the fitting result of the relationship between computing power and queue length in a certain feature bucket).
- computing power vocabulary storage : the computing power estimation results of different feature buckets are written into the vocabulary in the KV structure.
online phase :
- Feature Extraction & Analysis : Perform feature extraction and analysis, and generate Key values according to offline bucketing rules.
- computing power estimation : Find the computing power of the corresponding bucket based on the Key value.
3.1.5 Gear decision
1. Offline λ to solve
Based on the value estimation and computing power estimation modules, through the playback of historical traffic, the binary search algorithm is used to solve the optimal λ.
The core step of offline lambda solution is traffic playback: by replaying the historical flow at the same time, multiplexing the online logic, and simulating the optimal gear under the current lambda for each request.
main problems and solutions
- Problem description: The offline traffic is large. After the traffic is divided into multiple time slices, the solution process of λ requires multiple traffic playbacks, and the simulation complexity is high.
- Solution: quickly solve the optimal λ through traffic sampling, multiple time slice parallelization solutions and other solutions.
2. Online gear decision
3.2 System stability guarantee
Under the intelligent computing power, the system transforms from equal computing power distribution to dynamic computing power distribution. To ensure the stability of system services, we provide conventional measures such as fuse downgrade and realize the PID real-time control function based on system status.
- Flow Access : Provides different flow access rules for advertising space, city, time period, etc., to control the flow of intelligent computing power.
- monitoring alarm : Provides PV and system overall two dimensions of gear, performance, computing power and other changes monitoring and threshold alarm.
- fuse downgrade : real-time monitoring of the abnormality of the smart computing power. After reaching the configured abnormal threshold, the smart computing power will automatically fuse and degrade, and the equal computing power will be allocated.
- asynchronous decision : In order to ensure that the overall link of the main process does not increase in time, intelligent computing power decision-making is an asynchronous process, and equal computing power is allocated after timeout.
3.2.2 PID real-time control based on system status
PID (Proportion Integration Differentiation) is a mainstream control algorithm that uses proportional, integral, and differential control. We monitor and perceive changes in the system status in real time, and control the system status in real time based on the PID algorithm to ensure the stability of the system status.
The system status can usually be measured by the system's CPU/GPU utilization, QPS, RT (Avg, TP99, TP999, etc.), call failure rate (FailRate) and other indicators.
regulatory target
The control target should select the index that can quickly reflect the change of the system state. Based on this principle, we have selected TP999, FailRate, and CpuUtils as the control target.
regulation strategy
Based on PID controller, it supports multiple control strategies:
- control action gear : For flexible queues, the coefficient or upper limit of the queue length can be adjusted to adjust the candidate queue length.
- control decision formula : the λ coefficient of the decision formula can be adjusted to adjust the overall computing power consumption of the system.
regulation process
- system status report : The delivery engine service feeds back the current status of the system in real time through the monitoring system, and synchronizes to MQ. The core indicators have a timeliness of 10s.
- acquisition component : Based on the Flink stream processing framework, it analyzes and aggregates system state data in real time, and performs denoising and smoothing processing. The processed data is written to KV storage.
- control component : Based on PID algorithm, through polling to perceive system state changes, according to the selected control target, real-time control of system computing power, feedback control results to the decision-making component of the delivery engine, forming a closed control loop.
After the intelligent computing power is connected to the PID real-time control, when the system load is high, it can quickly, stably and effectively feedback and adjust, and keep the system performance at the target level.
4. Experiment
4.1 Experimental setup
- system calculation capacity selection : In order to ensure that the online system can quickly adjust according to the real-time flow, combined with the flow characteristics of the takeaway, we choose 15min as the smallest control unit; in the actual scenario, we choose the largest flow and stable performance in the past few days The peak period time slice of the time slice, and the total CPU time consumed in this time slice is counted as the system computing power capacity C.
- Baseline select : select online traffic without smart computing power as the control group.
- traffic value estimation : Select the most recent days of traffic as training data, use the XGB model to estimate and write the results into the vocabulary.
- flow calculation power estimation : select the flow of the most recent days as the training data, fit the change of the calculation power with the queue length into a piecewise linear function, and write the final estimation result into the vocabulary.
- offline λ Solve . By replaying the traffic of yesterday, we offline calculate the optimal λ in each time slice of the day (15 minutes is a time slice) and store it as a vocabulary.
- Experimental Idea : The system computing power capacity C used in offline simulation can control the consumption of online computing power. Therefore, the system computing power capacity C used for offline solution can be adjusted to achieve the experiment of flat machine resources or flat business income.
4.2 Experiment 1: Flat machine resources and increase business revenue
CPM | ROI | CTR | CVR | Machine resource | |
---|---|---|---|---|---|
Baseline (system computing power capacity=C) | +0.00% | +0.00% | +0.00% | +0.00% | +0.00% |
intelligent computing power (system computing power capacity = C) | +2.36% | -1.40% | +0.94% | +0.09% | +0.46% |
income source analysis
- During peak traffic hours, while ensuring the stability of the system, it also improves business revenue through differentiated computing power allocation.
- During non-peak traffic hours, improve system resource utilization and convert idle machine resources into business income.
4.3 Experiment 2: Flatten business income and reduce machine resources
CPM | ROI | CTR | CVR | Machine resource | |
---|---|---|---|---|---|
Baseline (system computing power capacity=C) | +0.00% | +0.00% | +0.00% | +0.00% | +0.00% |
intelligent computing power (system computing power capacity = 60% * C) | +0.70% | -1.22% | +0.15% | +0.85% | -40.8% |
income source analysis :
- By suppressing the power consumption of the noon peak and evening peak, the goal of reducing machine resources is achieved. As can be seen from the figure below, during peak hours, the machine resource consumption of the experimental group accounts for about 60% of the control group.
- At the same time, differentiated computing power is allocated during peak hours to increase resource utilization during off-peak hours and fill in the overall business revenue.
5. Summary and Outlook
This article mainly introduces the thinking and optimization ideas of intelligent computing power in the construction of takeaway advertising from 0 to 1 from the two aspects of optimal gear decision and system stability guarantee.
In the future, in terms of algorithm strategy, we will try to model and solve the optimal allocation of computing power under the full link combination of the system based on evolutionary algorithms and reinforcement learning algorithms; in terms of engine architecture, we will provide system simulation capabilities and online decision-making capabilities. Continue to optimize the stability guarantee capability, and at the same time try to combine with the company's elastic scaling system to give full play to the greater value of intelligent computing power.
6. References
[1] Jiang, B., Zhang, P., Chen, R., Luo, X., Yang, Y., Wang, G., ... & Gai, K. (2020). DCAF: A Dynamic Computation Allocation Framework for Online Serving System. arXiv preprint arXiv:2006.09684.
7. About the author
Shunhui, Jiahong, Song Wei, Guoliang, Qianlong, Le Bin, etc., all come from the Meituan takeaway advertising technology team.
8. Recruitment Information
Meituan Food Delivery advertising technology team is continuously recruiting a large number of positions. We are looking for advertising background/algorithm development engineers and experts, located in Beijing. Interested students are welcome to join us. You can submit your resume to: maoshunhui@meituan.com (please indicate the subject of the email: Meituan Waimai Advertising Technology Team)
Read more technical articles from the
the front | algorithm | backend | data | security | operation and maintenance | iOS | Android | test
160d2ea0f246b2 | . You can view the collection of technical articles from the Meituan technical team over the years.
| This article is produced by the Meituan technical team, and the copyright belongs to Meituan. Welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication, please indicate "the content is reproduced from the Meituan technical team". This article may not be reproduced or used commercially without permission. For any commercial activity, please send an email to tech@meituan.com to apply for authorization.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。