Netflix 如何在万亿行规模上推动受众洞察

  • Netflix's Muse Scaling: In a recent blog post, Netflix engineers described scaling Muse to handle trillion-row datasets. Muse helps creative and launch teams understand audience resonance with artwork and video assets. Its growth required advanced filtering and audience-affinity analysis at a massive scale.
  • Data Serving Layer Redesign: To meet demands, the data serving layer was redesigned, cutting query latencies by about 50% while maintaining accuracy and responsiveness.
  • From Spark to Druid: Muse started as a Spark-powered dashboard with a modest Apache Druid cluster. Over time, with the growth of data volumes and requests for more features, the serving layer needed to become more capable.
  • Challenges and Solutions:

    • Audience Affinities: Adding many-to-many relationships for algorithmically inferred labels like "Character Drama fans" or "Pop Culture enthusiasts" to impression and playback data increased complexity.
    • Impressions and Qualified Plays: Counting distinct users for impressions and qualified plays is expensive at Netflix scale. Apache DataSketches' HyperLogLog sketches were adopted to provide estimates within about one percent error. Sketches are built during Druid ingest and in Spark ETL jobs.
    • Load on Druid: To reduce load on Druid, Netflix used its in-house library Hollow for in-memory key/value stores. Hollow feeds are built from Iceberg tables and updated by producer servers and consumed by Spring Boot. This setup allows Muse to serve precomputed aggregates directly from memory, reducing query times and shielding Druid from high-concurrency requests.
    • Druid Tuning: Netflix tuned Druid by adjusting data splitting, resizing segments, and filtering out unused columns. It also used Druid's ability to store multiple values in a single field to handle audience affinities better. These changes cut query times roughly in half and made the system more consistent under heavy load.
  • Ensuring Accuracy: To ensure accuracy and trust, the legacy and new metric stacks were run in parallel and validated through automated Jupyter comparisons and in-app tools. The rollout was staged segment by segment with shadow testing and fine-grained feature flags for safe rollback.
  • Future Plans: Netflix plans to extend Muse to support 'Live' and Games, incorporate synopsis data, and refine metrics to distinguish between "effective" and "authentic" promotional assets.
阅读 22
0 条评论