Fine-Tuning GPT-OSS Models with Amazon SageMaker: Part 2 – Leveraging HyperPod Recipes for Enhanced Model Customization

Introduction

In this second installment of the GPT-OSS series, we delve into model customization using Amazon SageMaker AI. Building on the foundation laid in Part 1, where we fine-tuned GPT-OSS models utilizing Hugging Face libraries, this post explores the advanced techniques of fine-tuning models with SageMaker HyperPod recipes.

Solution Overview

This section provides a high-level view of utilizing SageMaker HyperPod recipes for streamlined model training and fine-tuning, emphasizing the efficiency and scalability of the architecture.

Prerequisites

Learn about the essential prerequisites to set up your environment for training GPT-OSS models using both SageMaker HyperPod and training jobs.

Data Tokenization

Discover how to tokenize multilingual datasets effectively, utilizing Hugging Face and SageMaker resources to prepare your data for training.

Fine-Tune the Model Using SageMaker HyperPod

Step-by-step instructions on setting up and executing fine-tuning tasks using SageMaker HyperPod, ensuring optimal training performance and resource management.

Fine-Tune Using SageMaker Training Jobs

A guide to employing SageMaker training jobs with recipes for seamless training experiences, focusing on automation and resource efficiency.

Run Inference

Everything you need to know about deploying your fine-tuned models for real-time inference using SageMaker endpoints and the vLLM framework.

Clean Up

Essential steps to take post-training to avoid unnecessary costs and manage your resources effectively.

Conclusion

A summary of the powerful capabilities that SageMaker and GPT-OSS offer for model training and deployment, along with recommendations for further exploration and use.

About the Authors

Meet the team behind this insightful piece, sharing their expertise and passion for building robust AI solutions at Amazon SageMaker.

Fine-Tuning GPT-OSS Models with Amazon SageMaker: Part 2

Welcome back to our GPT-OSS series! In Part 1, we showed you how to fine-tune GPT-OSS models using Hugging Face libraries in Amazon SageMaker, enabling you to leverage distributed multi-GPU and multi-node configurations to spin up high-performance clusters on demand. In this second part, we’ll dive deeper into fine-tuning GPT-OSS models using SageMaker HyperPod and Training Jobs, specifically focusing on leveraging recipes for efficiency.

Solution Overview

In this post, we’ll discuss how to employ SageMaker HyperPod recipes to quickly fine-tune models like Meta’s Llama, Mistral, and DeepSeek. These recipes offer pre-built configurations that simplify the distributed training process, ensuring that you maintain enterprise-grade performance and scalability. We’ll specifically focus on fine-tuning GPT-OSS on a multilingual reasoning dataset, HuggingFaceH4/Multilingual-Thinking, which allows the model to handle structured, chain-of-thought (CoT) reasoning across multiple languages.

Prerequisites

To get started, ensure you have the following prerequisites:

Development Environment: Set up your local environment with AWS credentials or use Amazon SageMaker Studio.
SageMaker HyperPod Configuration: If you’re going with HyperPod fine-tuning, ensure you have:
- At least one ml.p5.48xlarge instance (with 8 NVIDIA H100 GPUs) for training.
- Manage your SageMaker quotas on the Service Quotas console if necessary.
Dataset Preparation: You’ll need to prepare the multilingual reason dataset for fine-tuning, which can be found in our Generative AI using Amazon SageMaker GitHub repository.

Data Tokenization

We will be using the HuggingFaceH4/Multilingual-Thinking dataset, which contains examples translated into languages like French, Spanish, and German. To tokenize this dataset, follow this code snippet:

from datasets import load_dataset
from transformers import AutoTokenizer
import numpy as np

# Load the multilingual reasoning dataset
dataset = load_dataset("HuggingFaceH4/Multilingual-Thinking", split="train")
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")
# Tokenization function
def preprocess_function(example):
    return tokenizer.apply_chat_template(example['messages'], return_dict=True, padding="max_length", max_length=4096, truncation=True)

# Process and save the dataset
dataset = dataset.map(preprocess_function, remove_columns=['unneeded_columns'])
dataset.save_to_disk("/fsx/multilingual_4096")  # For HyperPod use
dataset.save_to_disk("multilingual_4096")       # For training jobs

Fine-Tuning with SageMaker HyperPod

Set Up Your Environment

To get started with fine-tuning using SageMaker HyperPod, set up your virtual environment like so:

python3 -m venv ${PWD}/venv
source venv/bin/activate
git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
cd sagemaker-hyperpod-recipes
pip3 install -r requirements.txt

Submit Your Training Job

Update the necessary configuration in k8s.yaml and the launch script. Here’s an example for the launch script:

#!/bin/bash
HF_MODEL_NAME_OR_PATH="openai/gpt-oss-120b"
TRAIN_DIR="/fsx/multilingual_4096"

python3 main.py \
    recipes=fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora \
    container="image_uri" \
    recipes.run.name="hf-gpt-oss-120b-lora" \
    cluster=k8s \
    recipes.model.data.train_dir="$TRAIN_DIR"

After making these changes, submit your job:

chmod +x launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh
bash launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh

Monitor the job’s progress with:

kubectl get pods
kubectl logs -f pod_name

Once training is complete, the final model will be located in your designated experiment folder.

Fine-Tuning with SageMaker Training Jobs

Utilizing SageMaker training jobs is straightforward—automatically spin up the compute, load your data, and run your training scripts. Here’s how to do it with the SageMaker Python SDK:

from sagemaker.pytorch import PyTorch

# Set up your SageMaker session and estimator
estimator = PyTorch(
    output_path=f"s3://{bucket}/output",
    entry_point="your_script.py",
    role=role, 
    instance_type="ml.p5.48xlarge",
    training_recipe="fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora",
)

# Start the training job
estimator.fit({'train': 's3://path/to/train', 'val': 's3://path/to/val'})

Model Deployment and Inference

Once your model is trained, you can deploy it directly to SageMaker endpoints using the vLLM framework, providing optimized, low-latency inference. Prepare your deployment container and define the model environment.

Here’s a snippet to deploy the model:

lmi_model = sagemaker.Model(
    image_uri=inference_image,
    env={"OPTION_MODEL": "/opt/ml/model"},
    role=role,
)

lmi_model.deploy(initial_instance_count=1, instance_type='ml.p5.48xlarge')

To invoke the model:

payload = {
    "messages": [{"role": "user", "content": "Hello, who are you?"}],
    "parameters": {"max_new_tokens": 64, "temperature": 0.2}
}

output = pretrained_predictor.predict(payload)

Clean-up

To avoid incurring additional costs, clean up your resources after use. Delete the SageMaker endpoint and any HyperPod clusters you’ve created.

pretrained_predictor.delete_endpoint()

Conclusion

In this post, we explored the fine-tuning of GPT-OSS models using SageMaker HyperPod recipes. The streamlined architecture leverages AWS’s scalable infrastructure, allowing organizations to efficiently optimize and serve custom large language models. By taking advantage of these capabilities, you can easily deploy high-performance AI applications with minimal setup time.

For further development, visit the Amazon SageMaker HyperPod recipes GitHub repository for extensive documentation and examples.

About the Authors

This post was brought to you by a team of dedicated AWS professionals who specialize in AI and machine learning. Their insights and expertise facilitate organizations in harnessing the power of generative AI efficiently and effectively.

We invite you to join us in the next part of our series, where we’ll delve into advanced training techniques and optimizations for large-scale models. Happy coding!

Exclusive Content:

Optimize OpenAI GPT-OSS Models with Amazon SageMaker HyperPod Recipes

Fine-Tuning GPT-OSS Models with Amazon SageMaker: Part 2 – Leveraging HyperPod Recipes for Enhanced Model Customization

Introduction

Solution Overview

Prerequisites

Data Tokenization

Fine-Tune the Model Using SageMaker HyperPod

Fine-Tune Using SageMaker Training Jobs

Run Inference

Clean Up

Conclusion

About the Authors

Fine-Tuning GPT-OSS Models with Amazon SageMaker: Part 2

Solution Overview

Prerequisites

Data Tokenization

Fine-Tuning with SageMaker HyperPod

Set Up Your Environment

Submit Your Training Job

Fine-Tuning with SageMaker Training Jobs

Model Deployment and Inference

Clean-up

Conclusion

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe