Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

How Amazon Bedrock’s Custom Model Import Simplified LLM Deployment for Salesforce

Streamlining AI Deployments: Salesforce’s Journey with Amazon Bedrock Custom Model Import

Introduction to Customized AI Solutions

Integration Approach for Seamless Transition

Scalability Benchmarking: Performance Insights

Evaluating Results: Operational Efficiency and Cost Savings

Key Lessons Learned for Future Deployments

Conclusion: Achieving Scalability and Cost Efficiency in AI Deployments

About the Authors

Streamlining AI Deployments: Salesforce’s Journey with Amazon Bedrock Custom Model Import

Co-written by Salesforce’s AI Platform Team: Srikanta Prasad, Utkarsh Arora, Raghav Tanaji, Nitin Surya, Gokulakrishnan Gopalakrishnan, and Akhilesh Deepak Gotmare.

In the fast-paced world of artificial intelligence, businesses are constantly seeking ways to enhance their operational efficiency while minimizing overhead costs. At Salesforce, our AI platform team has embraced this challenge by fine-tuning large language models (LLMs) like Llama, Qwen, and Mistral for our agentic AI applications, such as Agentforce. However, deploying these sophisticated models resulted in a considerable operational burden.

The traditional process—optimizing instance families, serving engines, and managing configurations—was not only time-consuming but also expensive, especially with the need for peak GPU reservations. Recognizing the need for a more efficient approach, we turned to Amazon Bedrock Custom Model Import.

Simplifying AI Deployment

With Amazon Bedrock Custom Model Import, our team was able to import and deploy customized models using a unified API. This streamlined approach not only minimized infrastructure management but also allowed seamless integration with other Amazon Bedrock features. Our focus shifted from managing infrastructure to concentrating on model performance and business logic, which is crucial for delivering effective AI solutions.

This article details how we effectively integrated Amazon Bedrock Custom Model Import into our machine learning operations (MLOps) workflow, allowing us to reuse existing endpoints without application changes and benchmark scalability. We’ll also share key metrics on operational efficiency and cost optimization gains, providing actionable insights for simplifying your deployment strategy.

Integration Approach

Transitioning from Amazon SageMaker Inference to Amazon Bedrock Custom Model Import was meticulously planned to avoid disrupting production workloads. Our primary goal was to maintain current API endpoints and model serving interfaces to ensure zero downtime and circumvent required changes for downstream applications.

By creating a bridge between our existing deployment workflows and Amazon Bedrock’s managed services, we enabled a gradual migration. This integration required a single additional step: after saving model artifacts to our Amazon S3 model store, we call the Amazon Bedrock Custom Model Import API. This lightweight operation (taking only 5-7 minutes, depending on model size) allowed Amazon Bedrock to pull models directly from S3, preserving our overall one-hour model release timeline.

The integration provided immediate performance benefits. Unlike SageMaker, which required model weights to be downloaded at container startup, Amazon Bedrock preloads the model, thus saving time and resources.

Scalability Benchmarking

To rigorously assess the performance capabilities of Amazon Bedrock Custom Model Import, we conducted extensive load testing across various concurrency scenarios. The goal was to observe how Amazon Bedrock’s transparent auto-scaling behavior would perform under real-world conditions.

Our testing results indicated that at low concurrency, Amazon Bedrock exhibited 44% lower latency when compared to the ml.g6e.xlarge baseline (bf16 precision). As load increased, Amazon Bedrock Custom Model Import maintained consistent throughput with latency under 10 milliseconds, showcasing its serverless architecture’s effectiveness in handling production demands.

Concurrency (Count) P95 Latency (Seconds) Throughput (Requests per Minute)
1 7.2 11
4 7.96 41
16 9.35 133
32 10.44 232

The results reflect a significant enhancement in our overall scalability capabilities and highlight the effectiveness of our new deployment strategy.

Results and Metrics

The implementation of Amazon Bedrock Custom Model Import yielded substantial improvements in both operational efficiency and cost optimization. We achieved a 30% reduction in the time it takes to iterate and deploy models to production—thanks to the alleviation of complex decisions concerning instance selection and model tuning.

In terms of cost, Amazon Bedrock allowed us to slash expenses by up to 40%. This was achieved by transitioning from reserved GPU capacities for peak workloads to a flexible pay-per-use model, which has been particularly beneficial in development and staging environments.

Lessons Learned

Salesforce’s experience with Amazon Bedrock Custom Model Import provided valuable insights for others contemplating a similar transition. While Amazon Bedrock supports multiple open-source model architectures, teams leveraging cutting-edge architectures must ensure compatibility before deployment.

Additionally, we explored utilizing Amazon API Gateway and AWS Lambda functions for pre- and post-inference processing. However, we found these solutions less compatible with existing integrations and observed negative impacts from cold starts. Maintaining warm endpoints by keeping at least one model copy active proved essential for reducing cold start latency—especially for larger models.

Conclusion

Salesforce’s journey with Amazon Bedrock Custom Model Import illustrates how organizations can significantly simplify their LLM deployment processes without sacrificing scalability or performance. We achieved a 30% faster deployment time and a 40% reduction in costs while retaining backward compatibility through our hybrid architecture using SageMaker proxy containers.

By executing methodical load testing and beginning with non-critical workloads, we’ve demonstrated that serverless AI deployment is feasible for production—even under variable traffic patterns.

For teams managing LLMs at scale, this case study offers a clear blueprint. Ensure model architecture compatibility, consider cold start implications, and maintain existing interfaces. Amazon Bedrock Custom Model Import presents a proven approach for reducing overhead, accelerating deployment, and optimizing costs while still meeting performance demands.

To learn more about pricing for Amazon Bedrock, check out Optimizing Costs for Using Foundational Models and for help deciding between Amazon Bedrock and SageMaker AI, see Amazon Bedrock or Amazon SageMaker AI?.

About the Authors

  • Srikanta Prasad is a Senior Manager in Product Management at Salesforce specializing in generative AI solutions.
  • Utkarsh Arora is an Associate Member of Technical Staff with significant contributions in ML engineering.
  • Raghav Tanaji is a Lead Member of Technical Staff specializing in machine learning and statistical learning.
  • Akhilesh Deepak Gotmare is a Senior Research Staff Member focusing on deep learning and natural language processing.
  • Gokulakrishnan Gopalakrishnan is a Principal Software Engineer at Salesforce with extensive experience in scalable software systems.
  • Nitin Surya is a Lead Member of Technical Staff with backgrounds in software and ML engineering.

Together, we are excited to share these insights and help inform your AI deployment strategies!

Latest

“ChatGPT Upgrade Leads to Increased Harmful Responses, Recent Tests Reveal”

Concerns Raised Over GPT-5 as New Model Produces More...

U.S. Artificial Intelligence Market: Size and Share Analysis

Overview of the U.S. Artificial Intelligence Market and Its...

Corporate and Private Equity Professionals Are Increasingly Embracing Generative AI Tools: Deloitte

Transforming Dealmaking: Key Insights from Deloitte's GenAI in M&A...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Dashboard for Analyzing Medical Reports with Amazon Bedrock, LangChain, and Streamlit

Enhanced Medical Reports Analysis Dashboard: Leveraging AI for Streamlined Healthcare Insights Introduction In healthcare, the ability to quickly analyze and interpret medical reports is crucial for...

Tailoring Text Content Moderation Using Amazon Nova

Enhancing Content Moderation with Customized AI Solutions: A Guide to Amazon Nova on SageMaker Understanding the Challenges of Content Moderation at Scale Key Advantages of Nova...

Building a Secure MLOps Platform Using Terraform and GitHub

Implementing a Robust MLOps Platform with Terraform and GitHub Actions Introduction to MLOps Understanding the Role of Machine Learning Operations in Production Solution Overview Building a Comprehensive MLOps...