Practice of QCon-oCPX multi-objective multi-scene joint modeling in OPPO

1 background

Since Facebook commercialized oCPX in 2012, the development of oCPX products and capabilities in the industry has been very mature. The commercialized algorithms of OPPO's Digital Intelligence Engineering System Algorithm Platform Department have also accumulated some practical experience in oCPX capacity building.

2 What is oCPX

Judging from the definition of a certain delivery platform, oCPX is a smart bid delivery method for performance advertising. Advertisers select clear optimization goals (such as download, activation, registration, payment) and give the expected conversion cost. The system passes Machine learning estimates the conversion probability of each placement opportunity, and combines the expected cost to automatically bid to ensure stable cost effectiveness.

o is the meaning of optimize optimization. CPX is a deduction method, CPC is deduction based on click, CPD is deduction based on download, and CPM is deduction based on exposure. So oCPC is an optimized CPC. oCPC is bidding based on conversion and deducted by click; oCPD is bidding based on conversion and deducted by download; oCPM is bidding based on conversion and deducted by impression.

oCPX has a set of bid control system that adjusts advertising bids in real time and intelligently to protect advertisers’ conversion cost demands. The control goal of the system is that the cost deviation is around 1.0. The specific method is to interfere with the eCPM ranking formula through the bid factor alpha to affect the billing amount. in:

Cost deviation = actual cost of advertising / expected cost of advertising
eCPM = pctr pcvr TCPA * alpha

As shown in the figure below, the bid point and billing point of CPC and oCPC are separated, the bid point of CPC is on the click, and the bid point of oCPC is on the activation, registration and payment after the click. The similarities between the two are the calculation. The cost points are all clicks.

图1：出价计费点

From a system perspective, CPC advertising requires advertisers to frequently adjust prices, adjust targeting, and protect costs themselves. As far as the platform is concerned, it is not profitable to sell traffic that is not based on deep value. oCPC advertising, for advertisers, the operation is simple, only need to set the cost. For the platform, oCPC can make the traffic sales according to the deep value of the more profitable, but the platform needs to do pcvr estimation and cost protection, which has higher requirements for the platform.

图2：CPC与oCPC区别

3 Introduction to the monetization position of oCPX ads in the OPPO App Store

The OPPO App Store has many locations, and the main scenarios include recommendations and searches. The recommended scenarios are mainly homepage, related recommendations and hot searches, and hot searches include hot search software and hot search games. The search scene mainly includes search association page and search result page. Lenovo page is actually sug. Enter query but don’t click search. The recommended app below is considered as Lenovo page. After clicking search, the result page is the result page.

The overall recommendation results include three types: advertising, intermodal games and natural quantity. Advertisements are placed by advertisers, and Douyin Kuaishou Pinduoduo are all big advertisers. The intermodal game is an APP that game developers put on the game center. It follows a stream split mode, such as Fantasy Westward Journey, King of Glory. Natural quantity is a regular APP, for example, WeChat never advertises and is not a game.

图3：商店广告位置

4 Challenges and countermeasures of conversion rate estimation

oCPX advertising conversion rate estimation faces many challenges. First, there are many conversion goals. There is a big difference in the data of different conversion goals, and the difference between activation and payment is two orders of magnitude, which has an impact on the efficiency of model learning. Second, the depth conversion delay is extremely large. Only about 50% of the payment due to the APP downloaded that day arrived on the same day, and most of the remaining payment arrived within the next seven days. When modeling, the day is negative and the next day is positive. The contrast will be The accuracy of this model has an impact. Third, sparse data makes it difficult to learn the model well. Fourth, the quality of conversion data is difficult to guarantee. Because the conversion data is sent back by the advertiser, it may be sent too much or less, the advertiser may send it wrongly or perform a pulsed return, which will easily lead to greater system pressure, so the data verification rules need to be very iteratively. Complete.

Take the 4 major modules of application distribution (homepage, hot search, related recommendation, download update page) as an example, with os activation, postback activation, postback registration, game registration, game payment, a total of 4*5=20 CVR estimation models . Independent modeling of each conversion target has the following disadvantages: First, there are many models, high maintenance costs, and a lot of training and storage resources. Second, the online memory occupancy conference caused the students in the group to line up to go online, reducing the efficiency of the experiment. Third, in-depth conversion CVR samples are too sparse and the amount of data is small, which will cause the model to underfit. Fourth, modeling alone cannot effectively use the amount of common information between types, for example, payment cannot use the amount of information brought by registration.

Why is MMOE particularly suitable for modeling multiple conversion goals? Because the shallow target and the deep target can share embedding, the shallow target can share the amount of information for the sparse sample depth conversion type, and there is no order of magnitude increase in the size and computational complexity of the model, and it will not impact performance. In the MMOR system, EXPERT can learn the commonalities between different types, and GATE and TOWER can learn the differences between different types. And after experimental testing, the MMOE version of the model is also better than the model with only oCPX type features.

图4：多目标模型结构

Based on MMOE, we have made some attempts. The conventional MMOE is a multi-task learning model in which the GATE configured for each task weights different EXPERT and sends it to each tower. The CGC version of MMOE is actually that each task must have a separate private EXPERT, and each individual private EXPERT is not affected by other tasks. Of course, the shared EXPERT still exists. ML-MMOE is actually the output of the ERPERT of the previous layer and the output of the weighted result of GATE to be used by the next layer of EXPERT. PLE is actually the ML version of the CGC network. Other optimization points include adding wide and gating methods to a single tower alone. The AUC benefits of different methods are as follows:

图5：不同方法 AUC收益

In the end, the 20 models were reduced to 2, and the online revenue was 1%+.

Now that there is multi-objective modeling, it will encounter the problem of multi-objective tuning. We tried two schemes internally, one is to adjust the loss between multiple targets based on uncertainty, and the other is to adjust the loss weights of different targets based on the learning rate of different targets.

Delayed conversion is a common problem in oCPX conversion rate modeling. Specifically, as shown in the figure below, 1000 clicks on August 1 brought 100 conversions, but these 100 conversions only reached 50 conversions on August 1st, and the delayed 50 conversions corresponded to Clicking on August 2 will be regarded as a negative sample for learning when modeling, and the 20 conversions that arrived on August 2 will be regarded as positive samples on August 3 for learning. Learning the same sample as both a positive sample and a negative sample will have a certain impact on the accuracy of the model. Common solutions mainly include negative sample weight adjustment, modeling based on exponential distribution reflux rate and using DNN to estimate reflux time. When adjusting the weight of day-level negative samples, it should be noted that the farther away the negative sample is from the day, the greater the probability that the negative sample is a negative sample.

图6：调权示例

In the field of CVR estimation, the estimated value will definitely deviate from the true value, so there is naturally a calibration plan. The first is the overall calibration. The overall calibration applies to the CVR uniform overestimation by 1.5 times, then divide it back. This is the probability of occurrence is relatively small. In reality, it is more common because negative samples are sampled when modeling, so the pcvr should be calibrated back as a whole. The second is segmented calibration. In reality, it is often found that it is not a uniform overestimation. We will find overestimation in the low segment and underestimation in the high segment. From the graph on the right, the estimated CVR on the abscissa is a segment, for example, 20 represents the segment 0.2, and the ordinate is the statistical CVR. If there are 100 samples in each segment, 20 positive examples and 80 negative examples, statistical CVR can be obtained. These red dots can be marked out during calibration, and then everyone sees that this red dot actually has a rule, that is, when the estimated value is low, the statistical value is relatively low, and the estimated value is high. Just high. If you calibrate directly at this time, you will actually find some glitches in the red dots. If the calibration has some influence on the AUC of the model at this time, use the order-preserving regression method to draw a green line, which is a monotonous and non-decreasing line. , If you use order-preserving regression to calibrate online, the AUC remains unchanged, which is also a common solution in the industry. The third is individual calibration. Individual calibration is more to deal with some bad cases of operations or advertisers.

图7：保序回归校准

Since ecpm's sorting formula has fixed hyperparameters, for example, eCPM=TCPA pcvr^pcvrT pctr^pctrT. Super-parameters usually need to be adjusted based on experience. When the data distribution changes, there is always benefit to adjust, but it requires human investment. A set of automatic search parameters system is necessary. The conventional overall tuning method is CEM, and the personalized tuning method is DDPG with reinforcement learning.

A set of reasonable evaluation indicators can guide everyone to iterate along the correct path. Common system evaluation indicators are as follows: consumption refers to the money that advertisers spend on the platform, that is, the revenue of the advertising platform. Advertiser value refers to the value that traffic brings to advertisers, and generally refers to the sum of the CPA of all conversions. Expected consumption is equivalent to advertiser value. Cost deviation is the ratio of actual deductions to expected consumption. The estimated deviation is equivalent to the ratio of the average online estimated value of the model to the posterior statistical value. Cost compliance rate (quantity dimension) refers to the proportion of advertisements whose cost deviation is between [0.8, 1.2]. Cost compliance rate (consumption dimension) refers to the proportion of advertising consumption that meets the cost to the overall consumption. The evaluation index of the system will pay close attention to two points, one is called the cost deviation and the other is called the estimated deviation. The cost deviation is the responsibility of the advertiser, and the estimated deviation measures the ability of the model.

5 Summary

Generally speaking, it is necessary to have a relatively complete control over the data, whether it is the monitoring of the transformed data or the construction of offline samples, it must be in place. The accuracy of the model can be optimized through the fusion of multiple models. In terms of strategy, all links must be involved. In the end, it is possible to reach a state of maximum value for advertisers where the cost is up to the standard and the model estimates are accurate. The greatest value is not only the value of traffic monetization, but also advertisers’ satisfaction with the platform and higher retention.

Author profile

Jun OPPO Senior Data Mining Engineer

With 7 years of experience in the field of advertising algorithms, he is mainly responsible for the capacity building of the app store oCPX.

For more exciting content, please scan the QR code to follow the [OPPO Digital Intelligence Technology] public account

Practice of QCon-oCPX multi-objective multi-scene joint modeling in OPPO

1 background

2 What is oCPX

3 Introduction to the monetization position of oCPX ads in the OPPO App Store

4 Challenges and countermeasures of conversion rate estimation

5 Summary

OPPO数智技术

引用和评论

OPPO云数据库访问服务技术揭秘

大模型中的Token究竟是什么？从原理到作用深度解析

【赞奇实测】DeepSeek 不同 GPU 性能测试一期（4090 VS 5000 Ada VS 5880 Ada）

深度探索 DeepSeek 微调：LoRA 与全参数微调实战指南

DeepSeek行业应用实践报告100+份汇总解读|附PDF下载

功率器件热设计基础（九）——功率半导体模块的热扩散

2025增长新前沿——AI人工智能拐点重塑人类潜力 400+份报告汇总解读 | 附PDF下载