Construction and practice of cloud music prediction system

Author: big man

1. What is an estimation system?

The core task of the estimation system is to complete the model calculation, which can be considered as a function ( for example: f(x1, x2)= ax1 + bx2 +c ). The parameters a, b, and c are the weight values obtained through model training, the independent variables x1 and x2 are the features, and the model calculation is the process of using the independent variables x1 and x2 to solve.
Therefore, what the estimation framework needs to do is: construct function input (feature), and calculate function to get result (model calculation). Namely: feature extraction, model calculation .

Feature Extraction <br>Features are abstract expressions of information related to a behavior. In the recommendation process, certain behavioral information must be converted into some mathematical form in order to be learned by the machine learning model. In order to complete this transformation, the information in the behavior process must be extracted in the form of features.
Model calculation <br>Integrate machine learning library, load the model trained by the machine learning platform, and execute the established matrix, vector and other calculations according to the model structure.

2. Design thinking of prediction system

2.1 Limitations of the general solution

A common solution in the industry is feature processing service + tfserving model computing service, which is to deploy feature processing and model computing services as separate services. It can provide better support when the number of features is small, but once the number of features increases, there will be obvious performance problems, mainly because the features have to be passed across the network (cross-process) to the underlying machine learning library, which will bring Multiple codecs and memory copies will cause huge performance overhead.

2.2. Thinking of the new system

According to business needs and absorbing the advantages and disadvantages of general solutions, we have built a new high-performance prediction system. The guiding ideas of the system are:

High performance : Pursue high performance in feature extraction and model computing, and provide large-scale feature extraction and large-scale model computing capabilities.
Abstraction and reuse : system layered design, layered reuse; online and offline feature extraction logic reuse; feature extraction proposes the concept of operators, establishes operator libraries, and realizes feature reuse.
Real-time : Real-time sample collection, real-time model training and real-time model update.
Scalability : Feature extraction provides a custom operator interface to facilitate the implementation of custom operators; model calculation provides the ability to integrate multiple machine learning libraries.

The following figure shows the cloud music prediction system architecture:

3. Estimate the construction process of the system

A good prediction system needs to solve the following three problems:

How to solve efficient iteration of features and models?
How to solve the performance problem of estimation calculation?
Is there any opportunity to improve the algorithm effect through engineering means?

3.1 Efficient iteration of features and models

3.1.1 System Hierarchical Design

The system is designed in layers, only the upper interface layer is exposed, and the middle layer and the bottom layer are completely reused between different services, which greatly reduces code development.

Bottom framework layer : This layer provides asynchronous mechanism, task queue, session management, multi-threaded concurrent scheduling, logic related to network communication, external file loading and unloading, etc.
Intermediate logic layer : encapsulate query management, cache management, model update management, model calculation management and other functions
Upper interface layer : Provide HighLevel interface according to the execution process, and the algorithm implements logic in this layer

The following figure shows the frame layered design:

3.1.2 Configuration completes the whole process of model calculation

According to the execution flow, the processing process is divided into three stages: data query, feature extraction, and model calculation. Each stage is extracted and encapsulated in the framework, and a configuration description language is provided to complete the logical expression of each stage.

(1) Data query

Through XML configuration table name, query key, cache time, query dependency, etc., the whole process of external query, parsing, and caching of feature data can be realized. As follows:

 <feature_tables>
    <table name="music-rec-fm_set_action" alias="trash_song" tag="user" key="user_id"/>
    <table name="music_fm_dsin_user_static_ftr_dpb" alias="u_static" tag="user" key="user_id"/>
    <table name="alg_song_ua_rt" alias="u_rt_red" tag="user" key="user_id" subkey="1"/>
    <table name="fm_dsin_song_promoted_info_feature_dpb_mdb" alias="item_promoted" tag="item" key="item_id" cache_time="7200" cache_size="800000" query_type="sync"/>
    <table name="fm_dsin_song_static_feature_dpb_mdb" alias="item_static" tag="item" key="item_id" cache_time="7200" cache_size="800000" query_type="asyc"/>
</feature_tables>

The feature data query configuration has brought about a significant improvement in development efficiency. With a few lines of configuration, the feature query function that used to require a lot of coding can be realized.

(2) Feature extraction

Develop feature extraction library, encapsulate feature extraction operators, develop feature calculation DSL language, and complete the entire feature extraction process through configuration. As follows:

 <feature_extract_config cache="true" log_level="3" log_echo="false" version="2">
    <fea name="isfollowedaid" dataType="int64" default="0L" extractor="StringHit($item_id, $uLikeA.followed_anchors)"/>
    <fea name="rt_all_all_pv" dataType="int64" default="LongArray(0, 5)" extractor="RtFeature($all_all_pv.f, 2)"/>
    <fea name="anchor_all_impress_pv" dataType="int64" default="0" extractor="ReadIntVec($rt_all_all_pv, 0)"/>
    <fea name="anchor_all_click_pv" dataType="int64" default="0" extractor="ReadIntVec($rt_all_all_pv, 1)"/>
    <fea name="anchor_all_impress_pv_id" dataType="int64" default="0" extractor="Bucket($anchor_all_impress_pv, $bucket.all_impress_pv)"/>
    <fea name="anchor_all_ctr_pv" dataType="float" default="0.0" extractor="Smooth($anchor_all_click_pv, $anchor_all_impress_pv, 1.0, 1000.0, 100.0)"/>
    <fea name="user_hour" dataType="int64" extractor="Hour()" default="0L"/>
    <fea name="anchor_start_tags" dataType="int64" extractor="Long2ID($live_anchor_index.start_tags,0L,$vocab.start_tags)" default="0L"/>
</feature_extract_config>

The feature extraction library will be described in detail below.

(3) Model calculation

It encapsulates model loading, parameter input, model calculation, etc., and realizes the whole process of model loading and calculation through configuration. The specific features are as follows:

The estimation framework integrates tensorflow core and supports multiple model forms.
Support multi-model loading and multi-model fusion scoring.
Support multi-buf model update, and automatically realize model warm-up loading.
Supports parameter input in multiple formats (Example and Tensor), built-in Example constructor and Tensor constructor, shields complex parameter construction details from the outside, and is easy to use.
Extend a variety of machine libraries, such as paddle, gbm, etc.

 <model_list>
    <!-- pb模型，tensor输入，指定out_names -->
    <id model_name="model1" model_type="pb" input_type="tensor" out_names="name1;name2" separator=";" />

    <!-- savedmodel模型，tensor输入，指定out_names -->
    <id model_name="model2" model_type="mdl" input_type="tensor" out_names="name1;name2" separator=";" />
    
    <!-- savedmodel模型，example输入，指定out_aliases -->
    <id model_name="model3" model_type="mdl" input_type="example" out_aliases="aliase1;aliase2" separator=";" signature="serving_default" />
</model_list>

Through the above configuration, the whole process of model loading, model input construction, and model calculation can be completed. Model computation functions that previously required a lot of coding were implemented in a few lines of configuration.

3.1.3 Package Feature Extraction Framework

The purpose of feature extraction is to convert non-standard data into standard data, and then provide it to machine learning training platforms and online computing platforms. Feature extraction is divided into offline process and online process.

The following figure shows what feature extraction is:

3.1.3.1 What are the problems with feature extraction?

Consistency is difficult to guarantee

Online extraction and offline extraction require the development of multiple sets of codes due to different platforms (different languages), which will lead to inconsistencies in logic. On the one hand, the consistency problem will affect the effect of the algorithm, and on the other hand, the consistency check will bring high project landing costs.

low development efficiency

Feature production and feature application need to correspond to multiple systems, and a new feature needs to be modified in multiple places, resulting in low development efficiency.

Difficult to reuse

The framework lacks the support for reuse capabilities, and the feature calculation logic is very different, which leads to difficulty in data reuse between teams, waste of resources, and inability to give full play to the value of features.

3.1.3.2 How to solve the above problems?

(1) Abstract operator

The concept of operator is proposed, and the feature calculation logic is encapsulated into an operator. In order to achieve operator abstraction, a unified data protocol (input and output of standardized operators) must be defined first. Here, we use dynamic PB technology to process features of any format in a unified way according to the metadata information of features.
In addition, the platform general operator library and business operator library are established to realize the reuse ability of feature data and feature calculation.

(2) Define feature calculation DSL language

Based on operators, we designed a feature computation representation language DSL, through which multi-level extraction and extraction dependencies can be supported, and complex feature computation logic can be realized through various combinations of basic operators, providing rich expression capabilities. The DSL expression is as follows:

(3) Solve the problem of logical inconsistency

The fundamental reason for the inconsistency of feature calculation logic is that feature calculation is divided into offline process and online process. Offline feature calculation is generally performed on the Spark platform and Flink platform (using scala or java language), while online feature calculation is generally performed in the C++ environment. Different platforms and languages lead to the development of multiple sets of codes for the same feature calculation logic, and logical inconsistencies may occur between multiple sets of codes.
In order to solve the above problems, it is necessary to realize multi-platform compatibility of feature processing logic. Considering that the C++ language is used on the thread computing platform, as well as the high performance of C++ and the compatibility of multi-platform (multi-language) operation, the feature processing core library is implemented in C++ language and provides C++ interfaces (supporting online computing platforms), java interface (supports spark and flink platforms). It also provides a one-click compilation method. After compilation, the so library and jar package are generated, which is convenient for integration into various platforms.

The following figure shows the feature extraction cross-platform solution:

By encapsulating the feature extraction library, it provides the feature extraction cross-platform operation capability, realizes one-time code writing, multi-platform (multi-language) operation, thereby solving the problem of inconsistent feature extraction logic on multiple platforms; Highly reused features and feature computing; by developing feature computing DSL language, the ability to configure feature extraction is realized. Based on the above construction, our development efficiency in feature extraction has been greatly improved.

3.2 High Performance Estimated Computing

The estimation service is a computing-intensive service with extremely high performance requirements. Especially when dealing with complex features and complex models, the performance problem is particularly prominent. We have a lot of performance thinking and attempts when developing the estimation system. Details.

3.2.1 Seamless integration of high-performance machine learning libraries

As mentioned earlier, the traditional estimation solution is to deploy the feature processing service and model calculation service in separate processes, which will involve cross-network transmission, serialization, deserialization of features, and frequent and large memory application and release, especially recommended The large amount of scene features will bring a large performance overhead. The following diagram shows the process of service deployment and feature transfer:

The following diagram shows the traditional scenario service deployment and feature transfer:

In order to solve the above problems, in the new prediction system, the high-performance machine learning library TF-core is seamlessly integrated into the prediction system, that is, feature processing and model calculation are deployed in the same process. In this way, the feature pointer operation can be realized, the overhead of serialization, deserialization, and network transmission can be avoided, and the performance can be greatly improved.

The following figure shows the characteristic transfer of the new prediction system:

3.2.2 Fully asynchronous architecture

In order to improve computing power, we have adopted a fully asynchronous architecture design. The computing framework is asynchronously processed, and external calls are non-blocking waiting, leaving thread resources for actual computing to the greatest extent. When the machine is overloaded, the time-out tasks in the task queue will be automatically discarded to prevent the machine/process from being dragged down.

The following diagram shows the asynchronous architecture design:

3.2.3 Multi-level cache

The request for the recommendation scenario consists of a User and a batch of candidate items (ranging from 50-1000), while the number of popular items is only at the level of 100,000 or millions. The characteristics of the items are divided into offline characteristics (hour-level or day-level update) ) and real-time features (second or minute updates). Based on the request characteristics of the recommended scenario, the Item feature can be completely cached in the process by request triggering, and the content in the cache can be directly used within the timeout period, avoiding invalid external query, feature analysis and feature extraction. It can not only greatly reduce the query volume of external storage, but also greatly reduce the resource consumption of the estimated service.
At present, the prediction service uses a caching mechanism in the feature data query and feature extraction calculation delay link. In the cache design, measures such as thread pool asynchronous processing and multiple buckets to save cache results to reduce write lock collisions are adopted to improve cache query performance. And provide a variety of feature query and caching strategies:

Synchronous query & LRU cache Applicable to features with important features and large feature scale (low performance)
Asynchronous query & LRU cache for less important and large-scale features (in performance)
Feature batch import Applicable to features with feature scale below 10 million (high performance)

The following figure shows the estimation system caching mechanism:

3.2.4 Reasonable Model Input Recommendations

For Tensorflow model input, the performance of Tensor input will be better than that of Example input. Figure 1 below shows the timeline of the Example input, and Figure 2 below shows the timeline of the Tensor input. It can be clearly found that when using Example as the model input, there is a long parsing process in the model, and the subsequent calculation op cannot be executed concurrently until the parsing is completed.

The following figure shows the Timeline of the Example input:

The following figure shows the Timeline of Tensor input:

(1) Why does using Tensor input improve performance?

Mainly to reduce the serialization and deserialization of Example, as well as the time-consuming of ParseExample.

(2) Compared with Example, what are the problems with using Tensor input?

The Tensor construction logic is complex, and the development efficiency is low. You need to pay attention to the Tensor dimension details. If the Tensor changes (add, delete, dimension change, etc.), you need to re-develop the Tensor construction code

In order to take into account the high performance of Tensor input and the development convenience of Example input, a set of Tensor constructors was developed in the estimation system, which not only guaranteed the performance, but also reduced the difficulty of development.

3.2.5 Model loading and update optimization

Try a variety of optimization strategies for model loading and updating, including: support for model dual buf hot update, support for model automatic warm-up loading, and delayed unloading of old models to improve model loading performance. Through the above optimization methods, the minute-level hot update of large models can be achieved, and online requests are time-consuming and jitter-free, providing a basis for real-time models.

The following figure shows the model automatic warm-up scheme:

3.3 Improve the algorithm effect through engineering means

In addition to ensuring high availability, low latency and other core elements of the prediction system, can some engineering methods be used to improve the algorithm effect? We have done some thinking and attempts from the dimensions of features, samples, models, etc.

3.3.1 What are the points that can be optimized in the traditional forecasting scheme?

(1) The problem of feature crossing

The traditional sample production method has the problem of feature crossing. Sample splicing is the splicing of user behavior and features. User behavior is the behavior generated by the user at the time of t-1 to estimate the result. The correct way is to splicing the user behavior at the time of t-1 with the features at the time of t-1, but The traditional method is that only the features at time t can be used for splicing, that is, the features at time t appear in the samples at time t-1, which leads to the generation of feature crossing. The occurrence of feature crossing will lead to inaccurate training samples, which will lead to a decrease in the effect of the algorithm.

(2) The model is not real-time enough

The traditional recommendation system updates user recommendation results at a daily level, which has poor real-time performance and cannot meet scenarios with high real-time requirements. For example, in the live broadcast, the broadcast status of the host, the change of the live broadcast content, and the adjustment of the business environment all require the recommendation system to perceive in real time.

3.3.2 How to solve the above problems?

In order to solve the above problems, we developed a model real-time scheme. This solution is based on the estimation service, which saves the estimated status (request content, features used at that time, etc.) to Kafka in real time, and associates it with the ua reflow log through traceid. In this way, it can be ensured that user tags and features can be in one-to-one correspondence, thereby solving the problem of feature traversal.
In addition, this solution also realizes the sample flow in seconds, which provides a sample basis for incremental training of the model. Model incremental training is implemented on the training side, and the model produced by the training is pushed to the network service in real time. The specific implementation is as follows:

The following figure shows the model real-time scheme:

Through the model real-time solution, the samples are saved in real time, so as to solve the problem of feature traversal; the minute-level training and update of the model can be realized, so that the model can capture new trends and new hotspots in time. Thereby improving the performance of the algorithm.

This article is published from the NetEase Cloud Music technical team, and any form of reprinting of the article is prohibited without authorization. We recruit all kinds of technical positions all year round. If you are ready to change jobs and happen to like cloud music, then join us at staff.musicrecruit@service.netease.com .