Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

A Technical Guide to Reinforcement Fine-Tuning on Amazon Bedrock Using OpenAI-Compatible APIs

Unlocking Powerful Customization: Reinforcement Fine-Tuning on Amazon Bedrock

A Comprehensive Guide to Implementing RFT for Enhanced Model Performance

How Reinforcement Fine-Tuning Works

The Core Concept: Learning from Feedback

Key Components of RFT

How Amazon Bedrock RFT Works

Prerequisites

Step 1: Configure the OpenAI Client

Step 2: Prepare and Upload Training Data

Step 3: Deploy a Lambda Reward Function

Step 4: Create the Fine-Tuning Job

Step 5: Monitor Training

Step 6: Run On-Demand Inference

Conclusion

About the Authors

Unlocking the Power of Reinforcement Fine-Tuning with Amazon Bedrock

In December 2025, we unveiled Reinforcement Fine-Tuning (RFT) on Amazon Bedrock, initially supporting Nova models. By February 2026, we expanded this support to include open-weight models such as OpenAI’s GPT OSS 20B and Qwen 3 32B. This innovative approach revolutionizes the way we customize large language models (LLMs), enabling them to learn from feedback on multiple response possibilities using a minimal set of prompts rather than traditional extensive datasets.

In this post, we’ll guide you through the RFT workflow on Amazon Bedrock using OpenAI-compatible APIs. We’ll demonstrate everything from authentication to deploying a Lambda-based reward function, kicking off a training job, and performing on-demand inference with your fine-tuned model. Our example will utilize the GSM8K math dataset, targeting OpenAI’s gpt-oss-20B model hosted on Bedrock.

How Reinforcement Fine-Tuning Works

RFT represents a paradigm shift in customizing LLMs. Unlike Traditional Supervised Fine-Tuning (SFT), which relies on static input-output pairs, RFT employs an iterative feedback loop. Here, models generate responses, receive evaluations, and continuously enhance their decision-making capabilities.

The Core Concept: Learning from Feedback

Reinforcement learning teaches an agent (in this case, an LLM) to refine its decision-making via feedback on its actions. Imagine training a chess player: instead of showing each possible move, you let them play and highlight winning moves over time. Likewise, for LLMs, RFT grants scores (rewards) to multiple responses based on how well they align with given criteria.

Key Components of RFT

Key components of RFT include:

  • Agent/Actor (Policy) Model: This is the foundation model, like Amazon Nova or OpenAI’s GPT-OSS 20B.
  • Input States: The context, which may include prompts and conversation history.
  • Output Actions: The model’s responses.
  • Reward Function: This critical component evaluates how good a response is to a given state, assigning numerical scores to guide learning.

RFT’s innovation lies in its ability for models to learn from their generated responses during training, rather than relying solely on pre-existing examples. This flexibility allows real-time adaptability, propelling the model’s efficiency and continuous improvement, making it particularly effective for tasks requiring verification, like mathematical reasoning.

How Amazon Bedrock RFT Works

Amazon Bedrock RFT is designed to facilitate reinforcement fine-tuning at the enterprise level. It automates the entire RFT pipeline, allowing teams to concentrate on problem-solving rather than on underlying infrastructure. The system handles batching, response generation, reward computation, and policy optimization seamlessly.

During the entire workflow, Amazon CloudWatch and the Bedrock console provide real-time metrics and insights into training progression and model performance.

Setup Steps

Before diving in, ensure you have:

  • An AWS account with Bedrock access in a supported region.
  • An Amazon Bedrock API key.
  • Proper IAM roles for Lambda execution and Bedrock fine-tuning.
  • Python with necessary libraries installed (openai, boto3, and aws-bedrock-token-generator).

Step 1: Configure the OpenAI Client

You need to point the standard OpenAI SDK at your Bedrock endpoint for seamless usage.

from openai import OpenAI
from aws_bedrock_token_generator import provide_token

AWS_REGION = "us-west-2"
MANTLE_ENDPOINT = f"https://bedrock-mantle.{AWS_REGION}.api.aws"

client = OpenAI(
    base_url=f"{MANTLE_ENDPOINT}/v1",
    api_key=provide_token(region=AWS_REGION),
)

Step 2: Prepare and Upload Training Data

Craft your dataset in JSONL format, specifying message roles and optional reference answers. For GSM8K, create samples containing math word problems.

{
  "messages": [
    {"role": "user", "content": "A chat between a curious User and a helpful Bot..."}
  ],
  "reference_answer": {
    "answer": "72"
  },
  "data_source": "gsm8k_nova"
}

Step 3: Deploy a Lambda Reward Function

The reward function is critical to RFT since it scores model-generated responses. Deploy a Lambda function to compute scores based on correct answers versus generated outputs.

def lambda_handler(event, context):
    # Scoring logic here
    return scores

Step 4: Create the Fine-Tuning Job

Initiate the fine-tuning job with a single API call.

job_response = client.fine_tuning.jobs.create(
    model="openai.gpt-oss-20b",
    training_file=training_file_id,
    extra_body={...}
)

Step 5: Monitor Training

You can track progress and events during training using the provided APIs. The metrics will provide valuable insights into the model’s learning progression.

Step 6: Run On-Demand Inference

Once training is complete, invoke your fine-tuned model for instant predictions.

response = client.chat.completions.create(
    model=fine_tuned_model,
    messages=[{"role": "user", "content": "What is the speed of a train traveling 120 miles in 2 hours?"}]
)

Conclusion

Reinforcement Fine-Tuning in Amazon Bedrock streamlines the customization of LLMs by integrating OpenAI SDK compatibility, Lambda-based reward functions, and on-demand inference capabilities. This framework empowers businesses to efficiently tailor AI models for their unique needs.

For hands-on examples, check out our GitHub repository containing the complete end-to-end code for both GPT-OSS 20B and Qwen 3 32B models: GitHub Repository.

For more insights, refer to the Amazon Bedrock RFT documentation.


About the Authors

Shreyas Subramanian: A Principal Data Scientist leading AI innovation.

Nick McCarthy: Senior Generative AI Specialist Solutions Architect assisting clients in customizing AI models.

Shreeya Sharma: Senior Technical Product Manager focusing on generative AI products.

Shalendra Chhabra: Product Leader at Bedrock with extensive product management experience.

Latest

Why the Leatherman Micra Remains the Top Choice for Keychain Tools

Leatherman Micra S26: A Trusted EDC Companion Reinvented with...

Unveiling Amazon Polly Bidirectional Streaming: Real-Time Speech Synthesis for Conversational AI Solutions

Announcing Amazon Polly's New Bidirectional Streaming API: Revolutionizing Real-Time...

OpenAI Expands ChatGPT Advertising Reach to Additional Markets

OpenAI Expands Advertising Pilot for ChatGPT to New Markets...

Living Sensors and Robotics Unite to Monitor Aquatic Biodiversity | CORDIS News

Revolutionizing Aquatic Biodiversity Monitoring with Biohybrid Robots Harnessing Living Sensors...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Enhancing LLM Fine-Tuning with Unstructured Data through SageMaker Unified Studio and...

Integrating Amazon SageMaker Unified Studio with S3 for Enhanced Visual Question Answering Using Llama 3.2 Overview of the Integration Last year, AWS announced an integration between...

Run Generative AI Inference Using Amazon Bedrock in Asia Pacific (New...

Amazon Bedrock Now Available in New Zealand: A New Era for Cross-Region Inference Unlocking Foundation Model Access for Kiwi Customers Exciting News for New Zealand: Amazon...

Scaling Video Insights with Amazon Bedrock’s Multimodal Models

Unlocking Video Insights: Harnessing the Power of Amazon Bedrock for Advanced Understanding The Evolution of Video Analysis Three Approaches to Video Understanding Frame-Based Workflow: Precision at Scale Shot-Based...