Streamlining Fine-Tuning and Deployment of Open Source LLMs with Oumi and Amazon Bedrock

This title captures the essence of the content, indicating that the post will focus on the processes of fine-tuning and deploying large language models using Oumi and Amazon Bedrock.

Fine-Tuning Large Language Models with Oumi and Amazon Bedrock

Co-authored by David Stewart and Matthew Persons from Oumi.

In the rapidly evolving landscape of artificial intelligence, the journey from experimentation to production can be cumbersome, especially when fine-tuning open-source large language models (LLMs). The hurdles of managing various training configurations, artifact handling, and scalable deployments can create friction in this transition. This blog post aims to simplify that journey by detailing a workflow that utilizes Oumi to fine-tune a Llama model on Amazon EC2, stores artifacts in S3, and deploys the model using Amazon Bedrock’s Custom Model Import.

Benefits of Oumi and Amazon Bedrock

Oumi serves as an open-source platform that streamlines the foundation model lifecycle—from data preparation through to evaluation. The key advantages of using Oumi in conjunction with Amazon Bedrock include:

Key Benefits

Recipe-driven Training: Define your configuration just once and reuse it across experiments, leading to reduced boilerplate and enhanced reproducibility.
Flexible Fine-tuning: Select from full fine-tuning or parameter-efficient methods such as LoRA, depending on your project constraints.
Integrated Evaluation: Evaluate your model checkpoints using pre-defined benchmarks without the need for extra tooling.
Data Synthesis: Generate task-specific datasets when your production data is limited.

Amazon Bedrock complements this process by offering managed, serverless inference capabilities. Once fine-tuned, you can easily import your model with a three-step process: upload to S3, create the import job, and invoke the model—all without the hassle of managing inference infrastructure.

Figure 1: Oumi manages data, training, and evaluation on EC2. Amazon Bedrock provides managed inference via Custom Model Import.

Solution Overview

This workflow consists of three main stages:

Fine-tune with Oumi on EC2: Start a GPU-optimized instance, install Oumi, and run training with your configuration. Oumi also supports distributed training for larger models with strategies like Fully Sharded Data Parallel (FSDP) and DeepSpeed.
Store Artifacts on S3: Upload your model weights, checkpoints, and logs to S3 for durable storage.
Deploy to Amazon Bedrock: Use the Custom Model Import job in Amazon Bedrock to point to your S3 artifacts, allowing automatic provisioning of inference infrastructure.

This architecture is designed to tackle common challenges associated with moving fine-tuned models into a production environment.

Technical Implementation

Let’s dive into a hands-on example using the meta-llama/Llama-3.2-1B-Instruct model. Although we chose this particular model as it fits well with an AWS g6.12xlarge EC2 instance, this methodology can be applied to various open-source models.

Prerequisites

To follow this walkthrough, make sure to set up the following AWS resources:

Clone the repository:

git clone https://github.com/aws-samples/sample-oumi-fine-tuning-bedrock-cmi.git
cd sample-oumi-fine-tuning-bedrock-cmi

Run the setup script:
```
./scripts/setup-aws-env.sh [--dry-run]
```

This script will prompt you for details about your AWS Region, S3 bucket name, EC2 key pair name, and security group ID.

Once your instance is up, SSH into it and continue with the next steps.

Step 1: Set Up the EC2 Environment

Update your EC2 instance and install dependencies:

sudo yum update -y
sudo yum install python3 python3-pip git -y

Clone and navigate to the project directory again:

git clone https://github.com/aws-samples/sample-oumi-fine-tuning-bedrock-cmi.git
cd sample-oumi-fine-tuning-bedrock-cmi

Configure your environment variables:

export AWS_REGION=your-region
export S3_BUCKET=your-bucket-name
export S3_PREFIX=your-s3-prefix
aws configure set default.region "$AWS_REGION"

Run the setup script to configure the environment:

./scripts/setup-environment.sh
source .venv/bin/activate

Authenticate with Hugging Face to access gated model weights.

Step 2: Configure Training

By default, the dataset is set to tatsu-lab/alpaca, which Oumi downloads automatically. If you want to change it, update the dataset_name parameter in configs/oumi-config.yaml.

If you’re interested in generating synthetic data, update the model_name in configs/synthesis-config.yaml and run:

oumi synth -c configs/synthesis-config.yaml

Step 3: Fine-Tune the Model

Fine-tune using Oumi’s training recipe:

./scripts/fine-tune.sh --config configs/oumi-config.yaml --output-dir models/final [--dry-run]

Monitor your job with nvidia-smi or AWS CloudWatch. If needed, enable EC2 Automatic Instance Recovery for long-running jobs.

Step 4: Evaluate the Model (Optional)

Evaluate your fine-tuned model using standard benchmarks:

oumi evaluate -c configs/evaluation-config.yaml

Step 5: Deploy to Amazon Bedrock

Upload your model artifacts to S3 and import it to Amazon Bedrock:

./scripts/upload-to-s3.sh --bucket $S3_BUCKET --source models/final --prefix $S3_PREFIX
./scripts/import-to-bedrock.sh --model-name my-fine-tuned-llama --s3-uri s3://$S3_BUCKET/$S3_PREFIX --role-arn $BEDROCK_ROLE_ARN --wait

Invoke the model:

./scripts/invoke-model.sh --model-id $MODEL_ARN --prompt "Translate this text to French: What is the capital of France?"

Step 6: Clean Up

To avoid ongoing costs, remove the resources created during this walkthrough:

aws ec2 terminate-instances --instance-ids $INSTANCE_ID
aws s3 rm s3://$S3_BUCKET/$S3_PREFIX/ --recursive
aws bedrock delete-imported-model --model-identifier $MODEL_ARN

Conclusion

In this post, we explored how to fine-tune the Llama-3.2-1B-Instruct model using Oumi on EC2 and deploy it via Amazon Bedrock’s Custom Model Import feature. This method allows you full control over your fine-tuning process while leveraging managed inference capabilities.

You can kickstart your own fine-tuning pipeline by checking out the companion repository. Happy building!

Acknowledgements

Special thanks to Pronoy Chopra and Jon Turdiev for their invaluable contributions.

About the Authors:

Bashir Mohammed is a Senior Lead GenAI Solutions Architect at AWS, specializing in architectural deployment for production-scale applications.

Bala Krishnamoorthy is a Senior GenAI Data Scientist at Amazon Bedrock GTM, helping startups leverage AI technology effectively.

Greg Fina is a Principal Startup Solutions Architect specializing in Generative AI, focusing on application modernization.

David Stewart leads Field Engineering at Oumi, enhancing generative AI applications through custom solutions.

Matthew Persons is a cofounder at Oumi, dedicated to developing open generative AI systems for practical uses.

Exclusive Content:

Fast-Track Your Custom LLM Deployment: Fine-Tune with Oumi and Launch on Amazon Bedrock