Announcing the Launch of OpenAI’s GPT OSS Models on Amazon SageMaker JumpStart
Get Started with Cutting-Edge Generative AI on AWS
Solution Overview
Prerequisites
Deploy gpt-oss-120b through the SageMaker JumpStart UI
Deploy gpt-oss-120b with the SageMaker Python SDK
Enable Web Search on Your Model with EXA
Run Inference with the SageMaker Predictor
Function Calling
Clean Up
Conclusion
About the Authors
Unleashing the Power of OpenAI’s GPT OSS Models on AWS
Today, we are thrilled to announce the availability of OpenAI’s groundbreaking open-weight GPT OSS models, gpt-oss-120b and gpt-oss-20b, integrated seamlessly into Amazon SageMaker JumpStart. With this launch, users can now deploy some of OpenAI’s most advanced reasoning models to innovate, experiment, and responsibly scale their generative AI applications on AWS.
In this post, we’ll guide you through the process of getting started with these models on SageMaker JumpStart.
Solution Overview
The OpenAI GPT OSS models excel in various tasks such as coding, scientific analysis, and mathematical reasoning. Both models provide:
- A 128K context window for enhanced conversations
- Adjustable reasoning levels (low/medium/high) tailored to specific requirements
- Support for external tool integration and capability to orchestrate agentic workflows with platforms like Strands Agents
These models also feature robust chain-of-thought output capabilities, providing insight into the model’s reasoning process. With the OpenAI SDK, interacting with your SageMaker endpoint becomes a breeze by simply updating the endpoint settings. You can modify and customize these models to suit your business needs while celebrating enterprise-grade security and effortless scalability.
Amazon SageMaker JumpStart is a fully managed service offering state-of-the-art foundation models (FMs) tailored for a variety of applications including content creation, code generation, Q&A, summarization, classification, and information retrieval. With a comprehensive catalog of pre-trained models, JumpStart accelerates the development and deployment of machine learning applications.
You can now discover and deploy OpenAI’s models effortlessly in Amazon SageMaker Studio or programmatically using the Amazon SageMaker Python SDK. Benefit from comprehensive model performance insights and MLOps features with tools such as Amazon SageMaker Pipelines and Debugger. All models are deployed in a secure AWS environment under your VPC controls, bolstering data security for enterprise needs.
The GPT OSS models can be deployed in the US East (Ohio, N. Virginia) and Asia Pacific (Mumbai, Tokyo) AWS Regions.
This guide primarily uses the gpt-oss-120b model, but you can replicate the steps for gpt-oss-20b as well.
Prerequisites
Before diving into deployment, ensure you have the following:
- An AWS account for resource management.
- An AWS Identity and Access Management (IAM) role to access SageMaker. Learn more about IAM integration with SageMaker here.
- Access to SageMaker Studio, a SageMaker notebook instance, or an IDE like PyCharm or Visual Studio Code (we recommend SageMaker Studio for effortless deployment).
-
The appropriate instance types for the models; the recommended instance type is p5.48xlarge. For verification, follow these steps:
- Navigate to the Service Quotas console and select Amazon SageMaker.
- Confirm that you have sufficient quota for the required instance type in your target Region.
- If needed, request a quota increase and consult your AWS account team for assistance.
Deploying gpt-oss-120b through the SageMaker JumpStart UI
Follow these steps to deploy gpt-oss-120b:
- Open the SageMaker console and select Studio in the navigation pane.
- If you’re a first-time user, create a domain; otherwise, click Open Studio.
- In the Studio console, navigate to JumpStart in the left pane.
- On the JumpStart landing page, utilize the search box to find gpt-oss-120b.
- Click on the model card to review details, including license information and usage instructions. Check the configuration and model details.
- Click Deploy to initiate deployment.
- Input the endpoint name (up to 50 alphanumeric characters).
- Specify the number of instances (between 1 and 100, default is 1).
- Select the instance type (for optimal performance, choose a GPU-based instance like p5.48xlarge).
- Click Deploy to finalize and create the endpoint.
Upon successful deployment, your endpoint’s status will change to In Service, indicating it’s ready to handle inference requests.
Deploying gpt-oss-120b with the SageMaker Python SDK
To deploy via the SDK, follow these steps with the model_id set to openai-reasoning-gpt-oss-120b:
from sagemaker.jumpstart.model import JumpStartModel
# Deploying the Model
accept_eula = True
model = JumpStartModel(model_id="openai-reasoning-gpt-oss-120b")
predictor = model.deploy(accept_eula=accept_eula)
You can also deploy the gpt-oss-20b using its respective model ID.
Enable Web Search with EXA
By default, SageMaker JumpStart models operate in network isolation. However, the GPT OSS models support a built-in web search tool through EXA—a meaning-based web search API. To utilize this tool, acquire an API key from EXA and pass it as an environment variable during deployment:
# Deployment with EXA Key
model = JumpStartModel(
model_id="openai-reasoning-gpt-oss-120b",
enable_network_isolation=False,
env={"EXA_API_KEY": "<your_api_key>"}
)
predictor = model.deploy(accept_eula=True)
For deployment with default configurations (network isolation enabled):
model = JumpStartModel(model_id="openai-reasoning-gpt-oss-120b")
predictor = model.deploy(accept_eula=True)
Run Inference with the SageMaker Predictor
Once your model is deployed, you can execute inference against the deployed endpoint through the SageMaker predictor:
payload = {
"model": "/opt/ml/model",
"input": [
{ "role": "system", "content": "You are a good AI assistant" },
{ "role": "user", "content": "Hello, how is it going?" }
],
"max_output_tokens": 200,
"stream": "false",
"temperature": 0.7,
"top_p": 1
}
response = predictor.predict(payload)
print(response['output'][-1]['content'][0]['text'])
The model will generate responses that feel conversational and contextually relevant.
Function Calling
The GPT OSS models utilize a harmony response format which is designed for defining conversation structures and generating reasoning outputs. Here’s how to structure function calls using this format:
payload = {
"model": "/opt/ml/model",
"input": "System: You are ChatGPT... [rest of your conversation setup]",
"instructions": "You are a helpful AI assistant...",
"max_output_tokens": 2048,
"stream": "false",
"temperature": 0.7,
"reasoning": { "effort": "medium" },
"tools": [ { "type": "function", "name": "get_current_weather", "description": "Gets the current weather for a specific location..." } ]
}
Clean Up
To prevent additional charges after testing, it’s vital to delete any resources created during the process:
predictor.delete_model()
predictor.delete_endpoint()
Conclusion
In this post, we showcased how to deploy and utilize OpenAI’s GPT OSS models (gpt-oss-120b and gpt-oss-20b) on SageMaker JumpStart. These advanced reasoning models open new horizons for coding, scientific analysis, and mathematical tasks directly within your AWS environment, equipped with enterprise-level security and scalability.
Try out these innovative models today and feel free to share your experiences in the comments below!
About the Authors
This entry was crafted by a talented team, including Pradyun Ramadorai, Malav Shastri, Varun Morishetty, and many more contributing experts focused on making AI more accessible and powerful through innovative solutions on AWS.