Introducing Amazon SageMaker JumpStart Optimized Deployments
Overview of SageMaker JumpStart
Amazon SageMaker JumpStart provides pretrained models to kickstart your AI workloads, making it easy to deploy solutions on SageMaker’s AI Managed Inference endpoints and HyperPod clusters.
Fast and Flexible Model Deployment
Model deployments via SageMaker JumpStart are user-friendly, allowing for configuration based on anticipated concurrent users and real-time performance metrics, tailored to unique use cases.
Announcing Optimized Deployments
SageMaker JumpStart now offers optimized deployment configurations, designed for specific use cases, ensuring maximum performance with customizable options.
Prerequisites for Optimized Deployments
A checklist of requirements to begin using SageMaker JumpStart optimized deployments effectively.
Getting Started with SageMaker JumpStart
Step-by-step instructions on how to initiate your journey with SageMaker JumpStart optimized deployments within SageMaker Studio.
Available Models for Optimization
A comprehensive list of models that support SageMaker JumpStart’s optimized deployments.
Call to Action
Encouragement for customers to explore and utilize SageMaker JumpStart optimized deployments for their applications.
About the Authors
Meet the team behind SageMaker JumpStart, detailing their expertise and roles in enhancing AI integration for customers.
Accelerate Your AI Workloads with Amazon SageMaker JumpStart: Now with Optimized Deployments
In the dynamic field of artificial intelligence, getting started quickly can make all the difference. Amazon SageMaker JumpStart presents a solution to this challenge by offering pretrained models across a wide array of problem types. This feature-rich platform is designed to assist businesses and developers as they embark on their AI journeys, providing access to solutions tailored to the most common use cases.
What is SageMaker JumpStart?
Amazon SageMaker JumpStart allows users to quickly transition from model selection to deployment. It provides customers with the ability to deploy models directly to SageMaker’s AI Managed Inference endpoints or HyperPod clusters. With its pre-set deployment options, users can swiftly kick off their AI applications without delving into intricate configurations.
Fast and Straightforward Deployments
Deploying models through SageMaker JumpStart is not only simple but also efficient. Customers can choose deployment options tailored to their anticipated concurrent users, gaining visibility into crucial performance metrics such as P50 latency, time-to-first token (TTFT), and throughput (tokens/second/user). While these configurations cater to general-purpose scenarios, many customers leverage SageMaker JumpStart for highly specific use cases like content generation and Q&A interactions. Consequently, each use case may necessitate tailored settings to maximize performance outcomes.
Introducing Optimized Deployments
Recognizing the diverse needs of its users, Amazon has recently unveiled SageMaker JumpStart Optimized Deployments. This new feature enhances the customization capabilities of SageMaker JumpStart by providing pre-defined deployment configurations tailored for specific use cases.
Users will continue to enjoy the transparency of their deployments, but now they can benefit from configurations specifically optimized for different performance constraints. This means customers can enjoy greater efficiency, whether they prioritize cost, speed, or throughput in their deployments.
Prerequisites for Starting
To make the most of SageMaker JumpStart Optimized Deployments, customers should meet the following minimum requirements:
- [List specific prerequisites here, if applicable]
Once these elements are in place, users can dive straight into utilizing the optimized deployments feature.
Getting Started with Optimized Deployments
Ready to jump in? Here’s how you can do it:
- Open SageMaker Studio and navigate to the Models section.
- Choose one of the models that supports optimized deployments.
- Hit the Deploy button located in the top-right corner.
Upon doing so, a collapsible window labeled "Performance" will appear, showcasing the options for optimized deployments.
Selecting Use Cases and Configurations
For text models, users will be prompted to select a specific use case ranging from generative writing to chat interactions. Image and video models will soon add their unique use cases as support expands.
Next, customers must choose from three optimization constraints:
- Cost Optimized
- Throughput Optimized
- Latency Optimized
For those in search of balanced performance, a Balanced option is available, catering to various metrics logged during usage.
Once these selections are made, a pre-set deployment configuration is automatically defined for the endpoint, allowing customers to review and adjust additional settings like timeouts, endpoint naming, and security features. After finalizing all configurations, simply click Deploy in the bottom-right corner, and you’re set!
Available Models for Optimized Deployments
SageMaker JumpStart has curated an impressive roster of models for optimized deployments, including:
Meta
- Llama-3.1-8B-Instruct
- Llama-2-7b-hf
- Llama-3.2-3B
- [More models]
Microsoft and Mistral AI
- Mistral-7B-Instruct-v0.2
- [Additional models]
Qwen
- Qwen3-8B
- [More models]
Google and Tiiuae
- gemma-7b
- [Additional models]
This is just the beginning; additional models will be introduced as SageMaker JumpStart continues to evolve.
Conclusion: Your AI Journey Starts Here
With SageMaker JumpStart’s optimized deployments, customers are empowered to experiment and find the right configurations for their specific applications. By diving into the available models in SageMaker Studio’s model hub, businesses can quickly and efficiently advance their AI workloads.
So why wait? Start exploring the power of SageMaker JumpStart optimized deployments today!
About the Authors
Dan Ferguson
Dan Ferguson is a Solutions Architect at AWS based in New York, USA. As a machine learning services expert, he is dedicated to assisting customers in integrating ML workflows efficiently and sustainably.
Malav Shastri
Malav Shastri is a Software Development Engineer at AWS, specializing in Amazon SageMaker JumpStart and Amazon Bedrock. His focus is on enabling customers to leverage cutting-edge open-source and proprietary models.
Pooja Karadgi
Pooja Karadgi leads product and strategic partnerships for Amazon SageMaker JumpStart. She is committed to accelerating customer AI adoption by simplifying the deployment process, making it easier to build production-ready generative AI applications.
Feel free to dive in and explore the limitless possibilities that await you with Amazon SageMaker JumpStart!