Transforming Personalized Content Recommendations at Warner Bros. Discovery with AWS Graviton
Insights from Machine Learning Engineering Leaders on Cost-Effective, Scalable Solutions for Global Audiences
Innovating Content Discovery: How Warner Bros. Discovery Transformed ML Inference with AWS Graviton
By Nukul Sharma, Machine Learning Engineering Manager, and Karthik Dasani, Staff Machine Learning Engineer, Warner Bros. Discovery
In the dynamic landscape of digital entertainment, it’s not enough for viewers to have exceptional content; they need sophisticated systems to discover what resonates with their unique tastes. At Warner Bros. Discovery (WBD), creators of beloved brands like HBO, CNN, and DC Entertainment, we’ve taken on the monumental task of delivering personalized content to our burgeoning audience of over 125 million users across more than 100 countries. In this post, we delve into how we leveraged AWS Graviton-based Amazon SageMaker to enhance our machine learning (ML) capabilities and revolutionize content recommendation for our viewers.
The Challenge: Scaling Personalization Without Compromise
Delivering localized recommendations from our HBO Max platform, which spans nine AWS regions globally, posed a significant challenge. The need for speed is paramount in recommendation systems—users expect real-time, personalized suggestions. With viewing traffic surging up to 500% during major premieres, we faced constraints in managing both scaling requirements and budgetary considerations while maintaining sub-100ms latency.
Our sophisticated recommendation algorithms rely on extensive data science and user behavior analysis. However, the approach must remain cost-effective to ensure continuous delivery of high-quality recommendations that engage audiences.
Our Solution: Harnessing AWS Graviton for Cost-Efficient ML Inference
In searching for a viable solution, we turned to two powerful AWS tools: AWS Graviton processors and Amazon SageMaker. Combining these technologies allowed us to tackle performance and cost challenges head-on.
The Power of AWS Graviton
AWS Graviton processors are engineered for superior price-performance ratios in cloud workloads, particularly optimized for ML. Their advanced features, including Neon vector processing and support for bfloat16, made them a compelling choice for our latency-sensitive recommender systems.
We started by testing Graviton with our XGBoost and TensorFlow models. The initial sandbox environment allowed us to optimize worker threads to maximize throughput on a single instance, revealing substantial gains in performance over x86-based instances. Transitioning to production traffic confirmed these benefits through shadow testing, showing Graviton’s linear scalability, even under heavy CPU loads.
Streamlining with SageMaker
Amazon SageMaker’s Inference Recommender streamlined our testing process, automating the evaluation across various instance types and configurations. This enabled rapid identification of optimal settings for our models, ensuring data-driven decisions that accelerated our deployment.
Utilizing SageMaker’s shadow testing capabilities allowed us to assess new deployments in a safe manner, comparing performance against existing systems without disrupting user experience. This strategic method enabled us to fine-tune our setup and preemptively address potential challenges.
A Comprehensive Approach
The integration of Graviton processors with SageMaker has transformed our ML inference framework. By utilizing multiple AWS managed services—like Amazon S3 and DynamoDB—we achieved our shared objective of scaling personalized content effectively while maintaining cost efficiency.
Achievements and Results
Our transition to AWS Graviton-based instances yielded remarkable results across our recommendation systems:
- 60% Cost Savings: Our analysis showed average savings of 60%, with catalog ranking models experiencing reductions up to 88%.
- Latency Improvements: We observed extraordinary performance increases, with some models experiencing latency reductions as high as 60%. For example, our XGBoost model demonstrated a dramatic 60% decrease in P99 latency.
- Enhanced User Experience: Thanks to these improvements, users benefit from more responsive recommendations that align closely with their interests.
The smooth migration process was facilitated by the collaborative efforts of our AWS account and service teams, taking about a month from initial benchmarking to full deployment—a testament to the efficiency gains realized through this collaboration.
Future Aspirations: Commitment to ML Excellence
Spurred by these significant efficiencies, we are motivated to achieve the goal of running 100% of our recommendation systems on Graviton instances, optimizing our infrastructure further for enhanced cost and performance.
Conclusion: The Path Ahead
Our journey with AWS Graviton has not only refined how we deliver personalized recommendations but also exemplified how cloud innovations can drive operational efficiency. As we continue optimizing our ML framework, these enhancements will solidify WBD’s competitive edge in the rapidly evolving entertainment landscape, ultimately enriching our viewer’s engagement and loyalty.
For further insights and developments in our personalization strategy, stay tuned!
Acknowledgments
The success of this initiative would not have been possible without our collaborative partners at AWS. Special thanks to Sunita Nadampalli, Utsav Joshi, Karthik Rengasamy, Tito Panicker, Sapna Patel, and Gautham Panth for their invaluable contributions.
About the authors:
Nukul Sharma is a seasoned Machine Learning Engineering Manager at WBD, with extensive expertise in developing scalable ML solutions. Karthik Dasani serves as a Staff Machine Learning Engineer, specializing in recommendation systems and performance optimization. Together, they bring depth and experience to WBD’s innovative content personalization strategies.