Unlocking Powerful Customization: Reinforcement Fine-Tuning on Amazon Bedrock
A Comprehensive Guide to Implementing RFT for Enhanced Model Performance
How Reinforcement Fine-Tuning Works
The Core Concept: Learning from Feedback
Key Components of RFT
How Amazon Bedrock RFT Works
Prerequisites
Step 1: Configure the OpenAI Client
Step 2: Prepare and Upload Training Data
Step 3: Deploy a Lambda Reward Function
Step 4: Create the Fine-Tuning Job
Step 5: Monitor Training
Step 6: Run On-Demand Inference
Conclusion
About the Authors
Unlocking the Power of Reinforcement Fine-Tuning with Amazon Bedrock
In December 2025, we unveiled Reinforcement Fine-Tuning (RFT) on Amazon Bedrock, initially supporting Nova models. By February 2026, we expanded this support to include open-weight models such as OpenAI’s GPT OSS 20B and Qwen 3 32B. This innovative approach revolutionizes the way we customize large language models (LLMs), enabling them to learn from feedback on multiple response possibilities using a minimal set of prompts rather than traditional extensive datasets.
In this post, we’ll guide you through the RFT workflow on Amazon Bedrock using OpenAI-compatible APIs. We’ll demonstrate everything from authentication to deploying a Lambda-based reward function, kicking off a training job, and performing on-demand inference with your fine-tuned model. Our example will utilize the GSM8K math dataset, targeting OpenAI’s gpt-oss-20B model hosted on Bedrock.
How Reinforcement Fine-Tuning Works
RFT represents a paradigm shift in customizing LLMs. Unlike Traditional Supervised Fine-Tuning (SFT), which relies on static input-output pairs, RFT employs an iterative feedback loop. Here, models generate responses, receive evaluations, and continuously enhance their decision-making capabilities.
The Core Concept: Learning from Feedback
Reinforcement learning teaches an agent (in this case, an LLM) to refine its decision-making via feedback on its actions. Imagine training a chess player: instead of showing each possible move, you let them play and highlight winning moves over time. Likewise, for LLMs, RFT grants scores (rewards) to multiple responses based on how well they align with given criteria.
Key Components of RFT
Key components of RFT include:
- Agent/Actor (Policy) Model: This is the foundation model, like Amazon Nova or OpenAI’s GPT-OSS 20B.
- Input States: The context, which may include prompts and conversation history.
- Output Actions: The model’s responses.
- Reward Function: This critical component evaluates how good a response is to a given state, assigning numerical scores to guide learning.
RFT’s innovation lies in its ability for models to learn from their generated responses during training, rather than relying solely on pre-existing examples. This flexibility allows real-time adaptability, propelling the model’s efficiency and continuous improvement, making it particularly effective for tasks requiring verification, like mathematical reasoning.
How Amazon Bedrock RFT Works
Amazon Bedrock RFT is designed to facilitate reinforcement fine-tuning at the enterprise level. It automates the entire RFT pipeline, allowing teams to concentrate on problem-solving rather than on underlying infrastructure. The system handles batching, response generation, reward computation, and policy optimization seamlessly.
During the entire workflow, Amazon CloudWatch and the Bedrock console provide real-time metrics and insights into training progression and model performance.
Setup Steps
Before diving in, ensure you have:
- An AWS account with Bedrock access in a supported region.
- An Amazon Bedrock API key.
- Proper IAM roles for Lambda execution and Bedrock fine-tuning.
- Python with necessary libraries installed (
openai,boto3, andaws-bedrock-token-generator).
Step 1: Configure the OpenAI Client
You need to point the standard OpenAI SDK at your Bedrock endpoint for seamless usage.
from openai import OpenAI
from aws_bedrock_token_generator import provide_token
AWS_REGION = "us-west-2"
MANTLE_ENDPOINT = f"https://bedrock-mantle.{AWS_REGION}.api.aws"
client = OpenAI(
base_url=f"{MANTLE_ENDPOINT}/v1",
api_key=provide_token(region=AWS_REGION),
)
Step 2: Prepare and Upload Training Data
Craft your dataset in JSONL format, specifying message roles and optional reference answers. For GSM8K, create samples containing math word problems.
{
"messages": [
{"role": "user", "content": "A chat between a curious User and a helpful Bot..."}
],
"reference_answer": {
"answer": "72"
},
"data_source": "gsm8k_nova"
}
Step 3: Deploy a Lambda Reward Function
The reward function is critical to RFT since it scores model-generated responses. Deploy a Lambda function to compute scores based on correct answers versus generated outputs.
def lambda_handler(event, context):
# Scoring logic here
return scores
Step 4: Create the Fine-Tuning Job
Initiate the fine-tuning job with a single API call.
job_response = client.fine_tuning.jobs.create(
model="openai.gpt-oss-20b",
training_file=training_file_id,
extra_body={...}
)
Step 5: Monitor Training
You can track progress and events during training using the provided APIs. The metrics will provide valuable insights into the model’s learning progression.
Step 6: Run On-Demand Inference
Once training is complete, invoke your fine-tuned model for instant predictions.
response = client.chat.completions.create(
model=fine_tuned_model,
messages=[{"role": "user", "content": "What is the speed of a train traveling 120 miles in 2 hours?"}]
)
Conclusion
Reinforcement Fine-Tuning in Amazon Bedrock streamlines the customization of LLMs by integrating OpenAI SDK compatibility, Lambda-based reward functions, and on-demand inference capabilities. This framework empowers businesses to efficiently tailor AI models for their unique needs.
For hands-on examples, check out our GitHub repository containing the complete end-to-end code for both GPT-OSS 20B and Qwen 3 32B models: GitHub Repository.
For more insights, refer to the Amazon Bedrock RFT documentation.
About the Authors
Shreyas Subramanian: A Principal Data Scientist leading AI innovation.
Nick McCarthy: Senior Generative AI Specialist Solutions Architect assisting clients in customizing AI models.
Shreeya Sharma: Senior Technical Product Manager focusing on generative AI products.
Shalendra Chhabra: Product Leader at Bedrock with extensive product management experience.