- Data-driven insights and leveraging them in decision-making are common. In digital advertising, user-ad and platform interactions are crucial.
- Matches between advertisers and publishers happen through auctions on DSPs and SSPs. The ad viewability rate is an important KPI.
Modeling process:
- Imported Python libraries and installed missing ones.
- Prepared data by preprocessing, including converting categorical variables to numerical, handling outliers and missing data, and deriving month and day variables. Encoded categorical variables using one-hot and hash encoding.
- Split data into training and testing sets and used XGBoost with hyperparameter optimization.
- Evaluated model performance using metrics like MSE and R². Found an R² value of 0.74 and MSE of 0.03. Used K-Fold Cross Validation with K = 4 for consistent performance.
- Conducted feature importance analysis using Permutation Importance and found that
ad_unit
,SSP
,browser
, andcreative_adsize
are most impactful, whilemonth
,device_type
,day
, andcreative_type
are least impactful. Other methods like Gain, Weight, and SHAP can be used for detailed analysis. - Used Learning Curve Analysis to check for overfitting and underfitting and found no signs with improved test performance as data size increased.
- Conclusion: Developed a model to predict ad viewability rates using XGBoost Regressor and hoped it was helpful. Encouraged sharing thoughts and questions.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。