Fine-Tuning GPT-OSS Models with Amazon SageMaker: Part 2 – Leveraging HyperPod Recipes for Enhanced Model Customization
Introduction
In this second installment of the GPT-OSS series, we delve into model customization using Amazon SageMaker AI. Building on the foundation laid in Part 1, where we fine-tuned GPT-OSS models utilizing Hugging Face libraries, this post explores the advanced techniques of fine-tuning models with SageMaker HyperPod recipes.
Solution Overview
This section provides a high-level view of utilizing SageMaker HyperPod recipes for streamlined model training and fine-tuning, emphasizing the efficiency and scalability of the architecture.
Prerequisites
Learn about the essential prerequisites to set up your environment for training GPT-OSS models using both SageMaker HyperPod and training jobs.
Data Tokenization
Discover how to tokenize multilingual datasets effectively, utilizing Hugging Face and SageMaker resources to prepare your data for training.
Fine-Tune the Model Using SageMaker HyperPod
Step-by-step instructions on setting up and executing fine-tuning tasks using SageMaker HyperPod, ensuring optimal training performance and resource management.
Fine-Tune Using SageMaker Training Jobs
A guide to employing SageMaker training jobs with recipes for seamless training experiences, focusing on automation and resource efficiency.
Run Inference
Everything you need to know about deploying your fine-tuned models for real-time inference using SageMaker endpoints and the vLLM framework.
Clean Up
Essential steps to take post-training to avoid unnecessary costs and manage your resources effectively.
Conclusion
A summary of the powerful capabilities that SageMaker and GPT-OSS offer for model training and deployment, along with recommendations for further exploration and use.
About the Authors
Meet the team behind this insightful piece, sharing their expertise and passion for building robust AI solutions at Amazon SageMaker.
Fine-Tuning GPT-OSS Models with Amazon SageMaker: Part 2
Welcome back to our GPT-OSS series! In Part 1, we showed you how to fine-tune GPT-OSS models using Hugging Face libraries in Amazon SageMaker, enabling you to leverage distributed multi-GPU and multi-node configurations to spin up high-performance clusters on demand. In this second part, we’ll dive deeper into fine-tuning GPT-OSS models using SageMaker HyperPod and Training Jobs, specifically focusing on leveraging recipes for efficiency.
Solution Overview
In this post, we’ll discuss how to employ SageMaker HyperPod recipes to quickly fine-tune models like Meta’s Llama, Mistral, and DeepSeek. These recipes offer pre-built configurations that simplify the distributed training process, ensuring that you maintain enterprise-grade performance and scalability. We’ll specifically focus on fine-tuning GPT-OSS on a multilingual reasoning dataset, HuggingFaceH4/Multilingual-Thinking, which allows the model to handle structured, chain-of-thought (CoT) reasoning across multiple languages.
Prerequisites
To get started, ensure you have the following prerequisites:
- Development Environment: Set up your local environment with AWS credentials or use Amazon SageMaker Studio.
- SageMaker HyperPod Configuration: If you’re going with HyperPod fine-tuning, ensure you have:
- At least one ml.p5.48xlarge instance (with 8 NVIDIA H100 GPUs) for training.
- Manage your SageMaker quotas on the Service Quotas console if necessary.
- Dataset Preparation: You’ll need to prepare the multilingual reason dataset for fine-tuning, which can be found in our Generative AI using Amazon SageMaker GitHub repository.
Data Tokenization
We will be using the HuggingFaceH4/Multilingual-Thinking dataset, which contains examples translated into languages like French, Spanish, and German. To tokenize this dataset, follow this code snippet:
from datasets import load_dataset
from transformers import AutoTokenizer
import numpy as np
# Load the multilingual reasoning dataset
dataset = load_dataset("HuggingFaceH4/Multilingual-Thinking", split="train")
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")
# Tokenization function
def preprocess_function(example):
return tokenizer.apply_chat_template(example['messages'], return_dict=True, padding="max_length", max_length=4096, truncation=True)
# Process and save the dataset
dataset = dataset.map(preprocess_function, remove_columns=['unneeded_columns'])
dataset.save_to_disk("/fsx/multilingual_4096") # For HyperPod use
dataset.save_to_disk("multilingual_4096") # For training jobs
Fine-Tuning with SageMaker HyperPod
Set Up Your Environment
To get started with fine-tuning using SageMaker HyperPod, set up your virtual environment like so:
python3 -m venv ${PWD}/venv
source venv/bin/activate
git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
cd sagemaker-hyperpod-recipes
pip3 install -r requirements.txt
Submit Your Training Job
Update the necessary configuration in k8s.yaml and the launch script. Here’s an example for the launch script:
#!/bin/bash
HF_MODEL_NAME_OR_PATH="openai/gpt-oss-120b"
TRAIN_DIR="/fsx/multilingual_4096"
python3 main.py \
recipes=fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora \
container="image_uri" \
recipes.run.name="hf-gpt-oss-120b-lora" \
cluster=k8s \
recipes.model.data.train_dir="$TRAIN_DIR"
After making these changes, submit your job:
chmod +x launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh
bash launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh
Monitor the job’s progress with:
kubectl get pods
kubectl logs -f pod_name
Once training is complete, the final model will be located in your designated experiment folder.
Fine-Tuning with SageMaker Training Jobs
Utilizing SageMaker training jobs is straightforward—automatically spin up the compute, load your data, and run your training scripts. Here’s how to do it with the SageMaker Python SDK:
from sagemaker.pytorch import PyTorch
# Set up your SageMaker session and estimator
estimator = PyTorch(
output_path=f"s3://{bucket}/output",
entry_point="your_script.py",
role=role,
instance_type="ml.p5.48xlarge",
training_recipe="fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora",
)
# Start the training job
estimator.fit({'train': 's3://path/to/train', 'val': 's3://path/to/val'})
Model Deployment and Inference
Once your model is trained, you can deploy it directly to SageMaker endpoints using the vLLM framework, providing optimized, low-latency inference. Prepare your deployment container and define the model environment.
Here’s a snippet to deploy the model:
lmi_model = sagemaker.Model(
image_uri=inference_image,
env={"OPTION_MODEL": "/opt/ml/model"},
role=role,
)
lmi_model.deploy(initial_instance_count=1, instance_type='ml.p5.48xlarge')
To invoke the model:
payload = {
"messages": [{"role": "user", "content": "Hello, who are you?"}],
"parameters": {"max_new_tokens": 64, "temperature": 0.2}
}
output = pretrained_predictor.predict(payload)
Clean-up
To avoid incurring additional costs, clean up your resources after use. Delete the SageMaker endpoint and any HyperPod clusters you’ve created.
pretrained_predictor.delete_endpoint()
Conclusion
In this post, we explored the fine-tuning of GPT-OSS models using SageMaker HyperPod recipes. The streamlined architecture leverages AWS’s scalable infrastructure, allowing organizations to efficiently optimize and serve custom large language models. By taking advantage of these capabilities, you can easily deploy high-performance AI applications with minimal setup time.
For further development, visit the Amazon SageMaker HyperPod recipes GitHub repository for extensive documentation and examples.
About the Authors
This post was brought to you by a team of dedicated AWS professionals who specialize in AI and machine learning. Their insights and expertise facilitate organizations in harnessing the power of generative AI efficiently and effectively.
We invite you to join us in the next part of our series, where we’ll delve into advanced training techniques and optimizations for large-scale models. Happy coding!