Optimizing AI Inference: A Collaboration Between Salesforce and AWS

This heading captures the essence of the collaboration, focusing on the optimization of AI inference, which is the main theme of the article.

Optimizing AI Deployment with Salesforce and AWS: Insights on Model Serving Innovations

This post is a joint collaboration between Salesforce and AWS and is being cross-published on both the Salesforce Engineering Blog and the AWS Machine Learning Blog.

In the rapidly evolving realm of artificial intelligence, optimizing model deployment is paramount. The Salesforce AI Platform Model Serving team is at the forefront of this challenge, diligently working to provide robust services for hosting large language models (LLMs) and other AI workloads. This blog post delves into how Salesforce, in partnership with Amazon Web Services (AWS), has harnessed the capabilities of Amazon SageMaker AI to enhance GPU utilization and resource efficiency while achieving significant cost savings.

The Challenge: Balancing Performance and Cost

For organizations of all sizes, deploying machine learning models efficiently and cost-effectively poses numerous challenges. The Salesforce AI Platform team manages various proprietary LLMs, including CodeGen and XGen, utilizing SageMaker AI for optimized inference deployment. With models ranging from a few gigabytes to 30 GB, each with unique performance and infrastructure demands, the team faced two critical challenges:

Underutilization of High-Performance GPUs: Their larger models, deployed on high-performance GPUs, often experienced low traffic patterns leading to resource waste.
High Costs for Mid-Sized Models: Conversely, their medium-sized models required high-throughput processing. These models, however, were often over-provisioned, incurring unnecessary costs.

The stakes were high: the balance between optimizing infrastructure costs and maintaining high AI performance was essential for sustainable growth.

Solution: Leveraging Amazon SageMaker AI Inference Components

To tackle these challenges, Salesforce utilized the inference components of Amazon SageMaker AI, enabling them to deploy multiple foundation models (FMs) on the same endpoint. This approach not only improved resource utilization but also allowed for granular control over resource allocation.

Key Benefits of Inference Components

Optimized Resource Management: SageMaker AI efficiently allocates GPU resources, maximizing utilization and driving cost savings.
Independent Model Scaling: Each model can scale according to its specific resource needs, ensuring optimal performance without unnecessary expense.
Dynamic Instance Scaling: The system can automatically add or remove instances, maintaining availability while minimizing idle compute resources.
Flexible Resource Allocation: Organizations can scale down to zero copies for less critical models, freeing resources while keeping essential models ready for traffic.

Configuring and Managing Inference Endpoints

Salesforce’s deployment process involves creating a SageMaker AI endpoint with defined configurations for instance types and initial counts. Using inference components, they can set specific resource requirements for each model, adjusting the number of instances dynamically based on traffic demands.

This intelligent setup maximized GPU utilization and reduced overhead, enabling seamless resource sharing among multiple models. The outcome? A substantial reduction in operational costs while maintaining high-performance standards across the board.

Real-World Application: CodeGen & Inference Components

Salesforce’s suite of proprietary models, like CodeGen, is leveraged in various applications to assist developers in efficient coding practices. By using inference components, the company was able to efficiently host multiple model variants on a unified endpoint, optimizing both performance and cost-management strategies.

Benefits Seen Post-Implementation

Optimized Resource Allocation: Efficient sharing of GPU resources across models eliminates unnecessary provisioning.
Cost Savings: The dynamic scaling capabilities have led to significant reductions in infrastructure costs.
Enhanced Performance: Smaller models benefited from high-performance GPUs, achieving low latency without an increase in operational expenses.

Conclusion: Future-Proofing AI Infrastructure

Through the strategic implementation of Amazon SageMaker AI inference components, Salesforce has redefined its AI infrastructure management, achieving impressive cost reduction and performance enhancement metrics. The ability to pack models intelligently and allocate resources dynamically has positioned Salesforce to thrive in a competitive landscape.

Looking ahead, Salesforce plans to utilize advanced capabilities such as SageMaker AI’s rolling updates for inference endpoints, streamlining model updates while minimizing operational overhead. This forward-thinking strategy not only enhances deployment efficiency but also paves the way for future AI innovations.

For further insights, check out our detailed articles on high-performance model deployment and getting started with Amazon SageMaker AI.

About the Authors

Rishu Aggarwal: Director of Engineering at Salesforce, focusing on LLM deployment and optimization.

Rielah De Jesus: Principal Solutions Architect at AWS, advocate for cloud migration and technical advisor for enterprise customers.

Pavithra Hariharasudhan: Senior Technical Account Manager at AWS, committed to operational excellence in cloud operations.

Ruchita Jadav: Senior Member of Technical Staff at Salesforce with a focus on scalable AI solutions and inference optimization.

Marc Karp: ML Architect at the Amazon SageMaker Service team, dedicated to designing and managing ML workloads effectively.

Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Running Your ML Notebook on Databricks: A Step-by-Step Guide

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Enhancing Salesforce Model Endpoints Using Amazon SageMaker AI Inference Tools

Optimizing AI Inference: A Collaboration Between Salesforce and AWS

Optimizing AI Deployment with Salesforce and AWS: Insights on Model Serving Innovations

The Challenge: Balancing Performance and Cost

Solution: Leveraging Amazon SageMaker AI Inference Components

Key Benefits of Inference Components

Configuring and Managing Inference Endpoints

Real-World Application: CodeGen & Inference Components

Benefits Seen Post-Implementation

Conclusion: Future-Proofing AI Infrastructure

About the Authors

Latest

Real-Time Voice Agents Using Stream Vision Agents and Amazon Nova 2 Sonic

Go.Compare Introduces Insurance App Powered by ChatGPT

Dstl-Backed Robotics Innovation Revolutionizes Military Manufacturing – A Case Study

Understanding Patient Sentiment in Atopic Dermatitis Management

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Running Your ML Notebook on Databricks: A Step-by-Step Guide

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

VOXI UK Launches First AI Chatbot to Support Customers

Enhancing Bot Precision with Amazon Lex Assisted NLU

Walmart Inc. (WMT): AI-Driven Equity Analysis

How Amazon Finance Leverages Generative AI on AWS to Streamline Regulatory...

Popular categories

Most recent

Real-Time Voice Agents Using Stream Vision Agents and Amazon Nova 2 Sonic

Go.Compare Introduces Insurance App Powered by ChatGPT

Dstl-Backed Robotics Innovation Revolutionizes Military Manufacturing – A Case Study

Most popular

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Running Your ML Notebook on Databricks: A Step-by-Step Guide

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Subscribe