Streamlining AI Deployments: Salesforce’s Journey with Amazon Bedrock Custom Model Import

Introduction to Customized AI Solutions

Integration Approach for Seamless Transition

Scalability Benchmarking: Performance Insights

Evaluating Results: Operational Efficiency and Cost Savings

Key Lessons Learned for Future Deployments

Conclusion: Achieving Scalability and Cost Efficiency in AI Deployments

About the Authors

Streamlining AI Deployments: Salesforce’s Journey with Amazon Bedrock Custom Model Import

Co-written by Salesforce’s AI Platform Team: Srikanta Prasad, Utkarsh Arora, Raghav Tanaji, Nitin Surya, Gokulakrishnan Gopalakrishnan, and Akhilesh Deepak Gotmare.

In the fast-paced world of artificial intelligence, businesses are constantly seeking ways to enhance their operational efficiency while minimizing overhead costs. At Salesforce, our AI platform team has embraced this challenge by fine-tuning large language models (LLMs) like Llama, Qwen, and Mistral for our agentic AI applications, such as Agentforce. However, deploying these sophisticated models resulted in a considerable operational burden.

The traditional process—optimizing instance families, serving engines, and managing configurations—was not only time-consuming but also expensive, especially with the need for peak GPU reservations. Recognizing the need for a more efficient approach, we turned to Amazon Bedrock Custom Model Import.

Simplifying AI Deployment

With Amazon Bedrock Custom Model Import, our team was able to import and deploy customized models using a unified API. This streamlined approach not only minimized infrastructure management but also allowed seamless integration with other Amazon Bedrock features. Our focus shifted from managing infrastructure to concentrating on model performance and business logic, which is crucial for delivering effective AI solutions.

This article details how we effectively integrated Amazon Bedrock Custom Model Import into our machine learning operations (MLOps) workflow, allowing us to reuse existing endpoints without application changes and benchmark scalability. We’ll also share key metrics on operational efficiency and cost optimization gains, providing actionable insights for simplifying your deployment strategy.

Integration Approach

Transitioning from Amazon SageMaker Inference to Amazon Bedrock Custom Model Import was meticulously planned to avoid disrupting production workloads. Our primary goal was to maintain current API endpoints and model serving interfaces to ensure zero downtime and circumvent required changes for downstream applications.

By creating a bridge between our existing deployment workflows and Amazon Bedrock’s managed services, we enabled a gradual migration. This integration required a single additional step: after saving model artifacts to our Amazon S3 model store, we call the Amazon Bedrock Custom Model Import API. This lightweight operation (taking only 5-7 minutes, depending on model size) allowed Amazon Bedrock to pull models directly from S3, preserving our overall one-hour model release timeline.

The integration provided immediate performance benefits. Unlike SageMaker, which required model weights to be downloaded at container startup, Amazon Bedrock preloads the model, thus saving time and resources.

Scalability Benchmarking

To rigorously assess the performance capabilities of Amazon Bedrock Custom Model Import, we conducted extensive load testing across various concurrency scenarios. The goal was to observe how Amazon Bedrock’s transparent auto-scaling behavior would perform under real-world conditions.

Our testing results indicated that at low concurrency, Amazon Bedrock exhibited 44% lower latency when compared to the ml.g6e.xlarge baseline (bf16 precision). As load increased, Amazon Bedrock Custom Model Import maintained consistent throughput with latency under 10 milliseconds, showcasing its serverless architecture’s effectiveness in handling production demands.

Concurrency (Count)	P95 Latency (Seconds)	Throughput (Requests per Minute)
1	7.2	11
4	7.96	41
16	9.35	133
32	10.44	232

The results reflect a significant enhancement in our overall scalability capabilities and highlight the effectiveness of our new deployment strategy.

Results and Metrics

The implementation of Amazon Bedrock Custom Model Import yielded substantial improvements in both operational efficiency and cost optimization. We achieved a 30% reduction in the time it takes to iterate and deploy models to production—thanks to the alleviation of complex decisions concerning instance selection and model tuning.

In terms of cost, Amazon Bedrock allowed us to slash expenses by up to 40%. This was achieved by transitioning from reserved GPU capacities for peak workloads to a flexible pay-per-use model, which has been particularly beneficial in development and staging environments.

Lessons Learned

Salesforce’s experience with Amazon Bedrock Custom Model Import provided valuable insights for others contemplating a similar transition. While Amazon Bedrock supports multiple open-source model architectures, teams leveraging cutting-edge architectures must ensure compatibility before deployment.

Additionally, we explored utilizing Amazon API Gateway and AWS Lambda functions for pre- and post-inference processing. However, we found these solutions less compatible with existing integrations and observed negative impacts from cold starts. Maintaining warm endpoints by keeping at least one model copy active proved essential for reducing cold start latency—especially for larger models.

Conclusion

Salesforce’s journey with Amazon Bedrock Custom Model Import illustrates how organizations can significantly simplify their LLM deployment processes without sacrificing scalability or performance. We achieved a 30% faster deployment time and a 40% reduction in costs while retaining backward compatibility through our hybrid architecture using SageMaker proxy containers.

By executing methodical load testing and beginning with non-critical workloads, we’ve demonstrated that serverless AI deployment is feasible for production—even under variable traffic patterns.

For teams managing LLMs at scale, this case study offers a clear blueprint. Ensure model architecture compatibility, consider cold start implications, and maintain existing interfaces. Amazon Bedrock Custom Model Import presents a proven approach for reducing overhead, accelerating deployment, and optimizing costs while still meeting performance demands.

To learn more about pricing for Amazon Bedrock, check out Optimizing Costs for Using Foundational Models and for help deciding between Amazon Bedrock and SageMaker AI, see Amazon Bedrock or Amazon SageMaker AI?.

About the Authors

Srikanta Prasad is a Senior Manager in Product Management at Salesforce specializing in generative AI solutions.
Utkarsh Arora is an Associate Member of Technical Staff with significant contributions in ML engineering.
Raghav Tanaji is a Lead Member of Technical Staff specializing in machine learning and statistical learning.
Akhilesh Deepak Gotmare is a Senior Research Staff Member focusing on deep learning and natural language processing.
Gokulakrishnan Gopalakrishnan is a Principal Software Engineer at Salesforce with extensive experience in scalable software systems.
Nitin Surya is a Lead Member of Technical Staff with backgrounds in software and ML engineering.

Together, we are excited to share these insights and help inform your AI deployment strategies!

Exclusive Content:

How Amazon Bedrock’s Custom Model Import Simplified LLM Deployment for Salesforce

Streamlining AI Deployments: Salesforce’s Journey with Amazon Bedrock Custom Model Import

Introduction to Customized AI Solutions

Integration Approach for Seamless Transition

Scalability Benchmarking: Performance Insights

Evaluating Results: Operational Efficiency and Cost Savings

Key Lessons Learned for Future Deployments

Conclusion: Achieving Scalability and Cost Efficiency in AI Deployments

About the Authors

Streamlining AI Deployments: Salesforce’s Journey with Amazon Bedrock Custom Model Import

Simplifying AI Deployment

Integration Approach

Scalability Benchmarking

Results and Metrics

Lessons Learned

Conclusion

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe