Design and implementation of unified estimation engine

1. Background

With the rapid development of the Internet, a huge amount of information has appeared on the Internet. How to recommend interesting information for users is a big challenge? Various recommendation algorithms and systems have emerged. The estimation engine can be said to be a more important part of the recommendation system. The effect of the estimation engine seriously affects the effect of the algorithm. Combined with oppo's business scenarios, our estimation engine needs to solve several problems:

(1) Versatility: Oppo has many recommended business scenarios, including information flow, stores, short videos, alliances, lock screen magazines, music and other services. The estimation engine must support so many scenarios, and the framework must be universal.

(2) Multi-model estimation: In the advertising scenario, ocpx needs to be supported, and CTR and CVR need to be estimated at the same time in one request, which may be multiple different conversion models (such as download, payment, etc.).

(3) Dynamic update of models: Some models are updated hourly, and some models are updated daily, so it needs to be able to update dynamically without feeling.

(4) Scalability: The features, dictionaries, and model types used by each business may be different, so it is easier to expand.

2. Positioning and technology selection

Considering that the business characteristics of each business are quite different, if the unified estimation engine is to support the strategy experiment and diversion experiment of each business, it will make the estimation engine module very bloated and impossible to maintain. Therefore, it is finally decided that the estimation engine only does things related to estimation.
From the original data to the estimated CTR result, it needs to go through feature extraction, and then the model prediction is performed on the result after the feature is extracted. Considering that the feature extraction results are relatively large, the feature results required for an estimation are about 2M. If such a large data is transmitted through the network, it will take too long. Therefore, the overall process of the estimation engine includes two parts: feature extraction and model estimation.

3. Design and Implementation of Predictor

3.1 The position of the Predictor module in the entire recommendation system

图片1.png

Take the oppo mobile application market as a category to illustrate the position of the predictor module in the entire system. The ranker module is responsible for sorting, including various diversion experiments and various operating strategies. It communicates with the predictor module through the grpc framework.

3.2 The main flow of the Predictor module
图片2.png

The figure shows the processing flow of two requests. The features mentioned in the figure include multiple feature confs, and each sample and each feature configuration is extracted once. The estimation will also be multiple models, and each sample, each conf, and each model will be estimated multiple times. Feature extraction relies on external dictionaries, and estimation relies on external model files. The update of external dictionaries and double buf switching are all done through the dictionary management module. The following is a detailed description of the dictionary management module, sub-task, feature extraction, estimation, and merge.

3.3 Dictionary management module

As shown in the figure below: conf1, conf2, lr_model_x0, etc. represent files on the disk. Each file name is parsed by a different dictionary parsing class. This dictionary parsing class is responsible for managing the loading of this file and switching between double bufs. For example: FeatureConfDict is responsible for parsing conf1. It stores two FeatureConfList type bufs inside. When the conf1 file is updated, it uses the standby FeatureConfList to load it. After the loading is completed, the service uses the main FeatureConfList pointer during the loading process. After the loading is complete, perform a master-slave switch, and release to the old buf memory after no request to use the old buf.
图片3.png

3.4 Sub-task logic

A request is received, which specifies multiple conf and multiple models to estimate. As shown below:
图片4.png

The above figure shows that there are a total of 8 samples, and two confs: conf0 and conf1 are used to extract features. The result of conf0 is estimated by model0 and model1, and conf1 is estimated by model2 and model3. According to 2 samples and one task, after splitting the task, 4 tasks will be obtained, as shown in the following figure:
图片5.png

Split into 4 tasks according to the sample dimensions, and throw them into the thread pool for execution.

3.5 Merge process

After the estimation is completed, you need to organize the estimation results to the ranker according to the conf/model dimension. The final response is shown in the right sub-figure of the following figure.
图片6.png

3.6 Design of Feature Extraction Framework

The feature extraction framework is used both online and offline, so it is better to ensure consistency between online and offline. Therefore, the online and offline business scenarios should be considered at the same time when designing.

Online is to combine the data brought by the request and the dictionary data into a sample for feature extraction. Offline is to compile so and call it through mapreduce, and the sample is obtained by deserializing a piece of text in hdfs.

3.6.1 Feature configuration file format

The feature configuration file consists of two parts: the schema part and the feature operator part.

3.6.2 schema part

图片7.png

There are 5 schema configurations in the above figure

user_schema: Indicates current user-related information, which is only used in online mode and brought by upstream requests.

item_schema: indicates the recommended item-related information, which is only used in online mode, part of it is brought by the request, and part of it is obtained from the dictionary file.

context_schema: Indicates the recommended context-related information, which is only used in online mode and brought by the game request. For example: Is the current network status wifi or 4G.

all_schema: Indicates the schema information of the final sample. In the online mode, the fields of user_schema, item_schema, and context_schema are placed in the corresponding positions of all_schema. The offline module is to deserialize lines of text in hdfs according to the type specified by all_schema_type. Regardless of whether it is online or offline, the final field order of the sample of the feature framework is stored in the order of all_schema

all_schema_type: Only used in offline mode. It specifies the type of each schema. These type names are defined in advance. In offline mode, each field is deserialized according to the schema type.

3.6.3 Feature Operator Configuration Part

图片8.png

Each feature includes the following fields:

Name: Feature name

Class: Indicates which feature operator class is used for this feature, corresponding to the class name in the code

Slot: uniquely identifies a feature

Depend: Indicates which fields the feature depends on, this field must exist in the above all_schema

Args: Represents the parameters passed to the feature operator, the frame will be converted to float and passed to the operator

Str_args: The parameters passed to the characteristic operator, passed in the form of a string.

3.6.4 Features are divided into groups (common and uncommon)

In an estimation request, the user and context information of all samples are the same. Considering that some features only rely on user and context information, this part of the feature only needs to be extracted once and is shared by all samples.

Some of the features in the feature configuration will depend on other features (such as combined features, cvm features), so it is necessary to analyze the dependency of the feature to determine the field information that a feature ultimately depends on.

图片9.png

The i_id feature depends on the item_id of the item field, so it is an uncommon feature

The u_a/net feature only depends on the user_schema or context field, and does not rely on the item field, so it is a common feature
The u_a-i_id combination feature depends on the i_id feature, and the interval depends on the item_id, so it is an uncommon feature.

The combined feature of u_a-net only relies on the u_a and network fields, so it is a common feature. In feature extraction, a request is counted only once.

Note: The feature group here is the asynchronous dictionary update thread to be responsible for the calculation, not the calculation of the request.

3.7 Estimated part

As mentioned earlier, a request specifies which feature profile to use to extract features and which model to use for estimation. All models have an asynchronous dictionary update thread to be responsible for the update. Currently, it supports the estimation of models such as LR, FM, DSSM, Deep&Wide, and is relatively easy to expand. The following roughly introduces the next two models that have been deeply optimized according to business scenarios:

3.7.1 FM model estimation (LR similar)

图片10.png
among them

图片11.png

Considering the business scenario, the user/context information of multiple samples is the same, so the online FM estimate can be written in this form:

图片12.png

All samples in the red part are the same, and a request is only calculated once.

3.7.2 DSSM (Double Tower) Model

The network structure of the twin tower model is shown in the figure below:

图片13.png

In fact, there are three towers, C (context information), U (user information), I (item information). The vector obtained by user and item sub-towers is subjected to dot product, and then summed with C.

3.7.3 Online serving part

Considering that the information of the item in some scenes is relatively slow to change, generally the sub-tower of the item is calculated offline first, and the vector is obtained, which is dynamically loaded through a dictionary.

图片14.png

When online, only the c tower and the u tower are used to calculate, and only one sample needs to be calculated for a request; the vector for the I tower is obtained by looking up the dictionary. Compared with full connection, the amount of calculation is greatly reduced. The performance has been greatly improved, but because the user information and the item only have one level of dot product multiplication, the offline auc will drop by 1% compared to the full connection, so it is more suitable for scenarios that do not require high accuracy, such as recall or coarse sorting. In the information flow advertising and alliance advertising business, the original statistical ctr rough ranking was replaced, and the comprehensive index increased by 5 to 6 percentage points.

3.8 Performance optimization

The estimation engine module has high requirements for time delay, but in order to achieve a better algorithm effect, the feature scale is continuously increasing, so when designing the estimation engine, many performance optimization considerations have been made. Mainly include the following aspects:

(1) Reduce memory allocation and improve performance through the object pool.

(2) The dependencies of feature fields are all converted into subscripts in advance. When extracting features, the subscripts are used directly to get the corresponding fields to reduce the amount of calculation.

(3) Some features rely on the results of other features, and will frequently query the corresponding results according to the slot. Considering the limited data of the slot, an array is used instead.

4. Summary

At present, the predictor module supports most recommended scenarios, including information flow content, information flow advertising, application market, alliance, search, short video, oppo lock screen magazine, music and other scenarios. It is deployed on nearly 2000 machines, indicating this design It is also relatively stable and has better scalability. Currently, it supports smooth switching of services to the dnn model, and various business scenarios have achieved certain benefits.

Author profile

Xiao Chao Senior Data Mining Engineer

10+ years of experience in advertising system engineering. At Oppo, he is mainly responsible for the engineering of model feature extraction and reasoning.

For more exciting content, please scan the QR code to follow the [OPPO Internet Technology] public account

Design and implementation of unified estimation engine

OPPO数智技术

引用和评论

OPPO云数据库访问服务技术揭秘

大模型中的Token究竟是什么？从原理到作用深度解析

DeepSeek行业应用实践报告100+份汇总解读|附PDF下载

功率器件热设计基础（九）——功率半导体模块的热扩散

英飞凌 | 驱动电路设计（二）——驱动器的输入侧探究

DeepSeek的开源之路:一文读懂从V1-R1的技术发展,见证从开源新秀到推理革命的领跑者

2025低空经济eVTOL行业研究报告42份汇总解读|附PDF下载