使用 XGBoost 回归算法预测广告可见性

  • Data-driven insights and leveraging them in decision-making are common. In digital advertising, user-ad and platform interactions are crucial.
  • Matches between advertisers and publishers happen through auctions on DSPs and SSPs. The ad viewability rate is an important KPI.
  • Modeling process:

    • Imported Python libraries and installed missing ones.
    • Prepared data by preprocessing, including converting categorical variables to numerical, handling outliers and missing data, and deriving month and day variables. Encoded categorical variables using one-hot and hash encoding.
    • Split data into training and testing sets and used XGBoost with hyperparameter optimization.
    • Evaluated model performance using metrics like MSE and R². Found an R² value of 0.74 and MSE of 0.03. Used K-Fold Cross Validation with K = 4 for consistent performance.
    • Conducted feature importance analysis using Permutation Importance and found that ad_unit, SSP, browser, and creative_adsize are most impactful, while month, device_type, day, and creative_type are least impactful. Other methods like Gain, Weight, and SHAP can be used for detailed analysis.
    • Used Learning Curve Analysis to check for overfitting and underfitting and found no signs with improved test performance as data size increased.
  • Conclusion: Developed a model to predict ad viewability rates using XGBoost Regressor and hoped it was helpful. Encouraged sharing thoughts and questions.
阅读 9
0 条评论