Introduction | Coarse sorting is a module between recall and fine sorting, and it is the product of a typical trade-off between precision and performance. To understand the technical details of rough rowing, we must always keep accuracy and performance in mind.
This article will in-depth rearrange this module to illustrate.
1. Overall structure
Coarse sorting is a module between recall and fine sorting. It obtains tens of thousands of candidate items from recall, and outputs hundreds of thousands of items for fine sorting, which is the product of a typical trade-off between precision and performance. For scenarios with a small recommendation pool, coarse ranking is optional. The overall structure of the rough row is as follows:
Second, the basic framework of rough row: samples, features, models
At present, the rough row is generally modeled, and the basic framework also includes three parts: data samples, feature engineering, and depth models.
(1) Data sample
At present, rough sorting is generally modeled, and its training samples are similar to fine sorting. Selected exposure clicks are positive samples, and no exposure clicks are negative samples. However, since the rough sorting is generally oriented to tens of thousands of candidate sets, while the fine sorting is only a few hundred to thousands, the solution space is much larger. Only using exposure samples as training, but predicting exposure and non-exposure at the same time, there is a serious sample selection bias (SSB problem), resulting in inconsistent training and prediction. Compared with fine sorting, the SSB problem of coarse sorting is obviously more serious.
(2) Feature engineering
The characteristics of coarse sorting can also be similar to fine sorting. Due to its high calculation delay requirement, only 10ms~20ms, it can be roughly divided into two categories:
Common features: similar to fine sorting, with three parts: user, context, and item. What features are there, and how they are handled, can be found in the Feature Engineering section of Refinement.
Cross feature: Cross feature between user and item, which is very helpful to improve the accuracy of the model. However, due to the excessive enumeration of cross-features, it is difficult to calculate and store offline. When scoring in real time, unlike the user feature that is calculated only once, the delay is high. Therefore, cross-features should be used with caution.
(3) Depth Model
Rough rowing has been basically modeled, and its development process is mainly divided into four stages:
The first generation: artificial rule strategy, which can build an artificial rule based on posterior statistics. For example, it integrates core factors such as the item's historical CTR, CVR, category price file, and sales volume. Manual rules have low accuracy, are not personalized, and cannot be updated in real time.
The second generation: LR linear model, which has certain personalization and real-time capabilities, but the model is too simple and the expression ability is weak.
The third generation: DSSM double-tower inner product depth model. It decouples user and item, and builds them independently through two Towers. In this way, the item vector can be stored offline and the online predict delay can be reduced. There are two main paradigms:
Both item and user are stored offline. This solution only needs to calculate the inner product of user and item, and the calculation delay is low. Since users are stored offline, complex models can be used to improve expressiveness. However, the real-time performance of the user side is poor, and user behavior cannot be captured in real time.
The item is offline, and the user is real-time. Item is relatively user, and the real-time requirements are not so high. Since a score is for the same user, the user side only needs to calculate it once in real time, and the speed is also very fast. Currently this program is used more.
The fourth generation: item and user are isolated, resulting in no feature crossover capability and weak model expression capability. Therefore, the fourth-generation model represented by COLD, the lightweight MLP rough row model, is proposed. It implements feature pruning through SE block, and cooperates with network pruning and engineering optimization to achieve trade-off between accuracy and performance.
3. Rough row optimization
Several main problems of rough row:
Accuracy and feature intersection problem: The classic DSSM model has many advantages and is currently widely used in coarse ranking. Its core disadvantage is the lack of feature intersection ability. As the so-called success also defeats Xiao He, it is precisely because of the separation of user and item that DSSM has high performance. But conversely, due to the lack of crossover between the two, the model expression ability is insufficient and the accuracy is reduced. Typical trade-off between accuracy and performance.
Low latency requirements: Coarse row delay requirements are high, generally only 10ms~20ms, far lower than the fine row requirements.
SSB problem: The coarse sorting space is much larger than that of the fine sorting. Like the fine sorting, only the exposure samples are used, which leads to a serious problem of sample selection bias.
(1) Accuracy improvement
The solutions to improve the accuracy mainly include refined distillation and feature intersection, and the main problem is to optimize the feature intersection problem.
- Fine distillation
The fine sorting model is used as a teacher to perform distillation learning on the rough sorting model, thereby improving the effect of rough sorting, which has become the basic paradigm of the current coarse sorting training.
- Feature intersection
Feature intersection can be implemented at the feature level or at the model level. The feature level is to manually construct cross features, which can still be used as the underlying input of the model in a separate Tower. The model level uses FM or MLP to achieve automatic crossover. The main methods are:
Feature distillation: teacher and student use the same network structure, teacher model uses common features and cross features, and student only uses common features. Students can learn high-level information about cross-features from teachers.
Add cross features: build manual cross features at the feature level, which are used in independent Tower. Since the cross feature is difficult to store offline and the real-time computing space is large, this independent Tower cannot be too complicated. Then we thought of the wide&deep model for the first time. The deep part still uses DSSM twin towers, and the wide part is a cross feature.
Lightweight MLP: Implement feature intersection at the model level and do not perform independent towers. For example, COLD reduces latency through feature clipping, network pruning, and engineering optimization, rather than relying on independent towers.
(2) Delay reduction
Accuracy and performance have always been a trade-off, and many solutions are looking for a balance between the two. The performance requirements of the coarse row are higher, and the delay must be controlled within 10ms~20ms. There are many common approaches to performance optimization.
There are mainly the following methods:
Feature clipping: such as COLD, unimportant features are filtered out first, which naturally reduces the overall delay. This layer can be done within the model, allowing for personalization and real-time updates.
Quantization and fixed point: For example, 32bit is reduced to 8bit, which can improve computing and storage performance.
Network pruning: network pruning, including synaptic pruning, neuron pruning, weight matrix pruning and other methods, will not be expanded.
Model distillation: model distillation, which has been mentioned above, will not be expanded.
Network Structure Search NAS: Use a lighter-weight, better-performing model. You can try network structure search NAS.
(3) SSB problem
The coarse sorting space is much larger than that of fine sorting. Like fine sorting, only exposure samples are used, which leads to a serious problem of sample selection bias. The fine row of unexposed samples can be scored and utilized to alleviate the SSB problem.
About the Author
Xie Yangyi
Tencent application algorithm researcher.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。