Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

A Technical Guide to Reinforcement Fine-Tuning on Amazon Bedrock Using OpenAI-Compatible APIs

Unlocking Powerful Customization: Reinforcement Fine-Tuning on Amazon Bedrock

A Comprehensive Guide to Implementing RFT for Enhanced Model Performance

How Reinforcement Fine-Tuning Works

The Core Concept: Learning from Feedback

Key Components of RFT

How Amazon Bedrock RFT Works

Prerequisites

Step 1: Configure the OpenAI Client

Step 2: Prepare and Upload Training Data

Step 3: Deploy a Lambda Reward Function

Step 4: Create the Fine-Tuning Job

Step 5: Monitor Training

Step 6: Run On-Demand Inference

Conclusion

About the Authors

Unlocking the Power of Reinforcement Fine-Tuning with Amazon Bedrock

In December 2025, we unveiled Reinforcement Fine-Tuning (RFT) on Amazon Bedrock, initially supporting Nova models. By February 2026, we expanded this support to include open-weight models such as OpenAI’s GPT OSS 20B and Qwen 3 32B. This innovative approach revolutionizes the way we customize large language models (LLMs), enabling them to learn from feedback on multiple response possibilities using a minimal set of prompts rather than traditional extensive datasets.

In this post, we’ll guide you through the RFT workflow on Amazon Bedrock using OpenAI-compatible APIs. We’ll demonstrate everything from authentication to deploying a Lambda-based reward function, kicking off a training job, and performing on-demand inference with your fine-tuned model. Our example will utilize the GSM8K math dataset, targeting OpenAI’s gpt-oss-20B model hosted on Bedrock.

How Reinforcement Fine-Tuning Works

RFT represents a paradigm shift in customizing LLMs. Unlike Traditional Supervised Fine-Tuning (SFT), which relies on static input-output pairs, RFT employs an iterative feedback loop. Here, models generate responses, receive evaluations, and continuously enhance their decision-making capabilities.

The Core Concept: Learning from Feedback

Reinforcement learning teaches an agent (in this case, an LLM) to refine its decision-making via feedback on its actions. Imagine training a chess player: instead of showing each possible move, you let them play and highlight winning moves over time. Likewise, for LLMs, RFT grants scores (rewards) to multiple responses based on how well they align with given criteria.

Key Components of RFT

Key components of RFT include:

  • Agent/Actor (Policy) Model: This is the foundation model, like Amazon Nova or OpenAI’s GPT-OSS 20B.
  • Input States: The context, which may include prompts and conversation history.
  • Output Actions: The model’s responses.
  • Reward Function: This critical component evaluates how good a response is to a given state, assigning numerical scores to guide learning.

RFT’s innovation lies in its ability for models to learn from their generated responses during training, rather than relying solely on pre-existing examples. This flexibility allows real-time adaptability, propelling the model’s efficiency and continuous improvement, making it particularly effective for tasks requiring verification, like mathematical reasoning.

How Amazon Bedrock RFT Works

Amazon Bedrock RFT is designed to facilitate reinforcement fine-tuning at the enterprise level. It automates the entire RFT pipeline, allowing teams to concentrate on problem-solving rather than on underlying infrastructure. The system handles batching, response generation, reward computation, and policy optimization seamlessly.

During the entire workflow, Amazon CloudWatch and the Bedrock console provide real-time metrics and insights into training progression and model performance.

Setup Steps

Before diving in, ensure you have:

  • An AWS account with Bedrock access in a supported region.
  • An Amazon Bedrock API key.
  • Proper IAM roles for Lambda execution and Bedrock fine-tuning.
  • Python with necessary libraries installed (openai, boto3, and aws-bedrock-token-generator).

Step 1: Configure the OpenAI Client

You need to point the standard OpenAI SDK at your Bedrock endpoint for seamless usage.

from openai import OpenAI
from aws_bedrock_token_generator import provide_token

AWS_REGION = "us-west-2"
MANTLE_ENDPOINT = f"https://bedrock-mantle.{AWS_REGION}.api.aws"

client = OpenAI(
    base_url=f"{MANTLE_ENDPOINT}/v1",
    api_key=provide_token(region=AWS_REGION),
)

Step 2: Prepare and Upload Training Data

Craft your dataset in JSONL format, specifying message roles and optional reference answers. For GSM8K, create samples containing math word problems.

{
  "messages": [
    {"role": "user", "content": "A chat between a curious User and a helpful Bot..."}
  ],
  "reference_answer": {
    "answer": "72"
  },
  "data_source": "gsm8k_nova"
}

Step 3: Deploy a Lambda Reward Function

The reward function is critical to RFT since it scores model-generated responses. Deploy a Lambda function to compute scores based on correct answers versus generated outputs.

def lambda_handler(event, context):
    # Scoring logic here
    return scores

Step 4: Create the Fine-Tuning Job

Initiate the fine-tuning job with a single API call.

job_response = client.fine_tuning.jobs.create(
    model="openai.gpt-oss-20b",
    training_file=training_file_id,
    extra_body={...}
)

Step 5: Monitor Training

You can track progress and events during training using the provided APIs. The metrics will provide valuable insights into the model’s learning progression.

Step 6: Run On-Demand Inference

Once training is complete, invoke your fine-tuned model for instant predictions.

response = client.chat.completions.create(
    model=fine_tuned_model,
    messages=[{"role": "user", "content": "What is the speed of a train traveling 120 miles in 2 hours?"}]
)

Conclusion

Reinforcement Fine-Tuning in Amazon Bedrock streamlines the customization of LLMs by integrating OpenAI SDK compatibility, Lambda-based reward functions, and on-demand inference capabilities. This framework empowers businesses to efficiently tailor AI models for their unique needs.

For hands-on examples, check out our GitHub repository containing the complete end-to-end code for both GPT-OSS 20B and Qwen 3 32B models: GitHub Repository.

For more insights, refer to the Amazon Bedrock RFT documentation.


About the Authors

Shreyas Subramanian: A Principal Data Scientist leading AI innovation.

Nick McCarthy: Senior Generative AI Specialist Solutions Architect assisting clients in customizing AI models.

Shreeya Sharma: Senior Technical Product Manager focusing on generative AI products.

Shalendra Chhabra: Product Leader at Bedrock with extensive product management experience.

Latest

Introducing Stateful MCP Client Features in Amazon Bedrock AgentCore Runtime

Unlocking Interactive AI Workflows: Introducing Stateful MCP Client Capabilities...

I Tried the ‘Let Them’ Rule for 24 Hours with ChatGPT — Here’s How I Stopped Overthinking

Embracing the "Let Them" Rule: How AI Helped Me...

Springwood High School Students in King’s Lynn Develop Problem-Solving Robots for Global Challenge

Aspiring Engineers at Springwood High School Tackle the First...

Non-Stop Work, 24/7

The Rise of AI Employees: Transforming the Modern Workplace Understanding...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Integrate a Live AI Browser Agent into Your React App Using...

Enhancing User Trust in AI with Real-Time Browser Interaction: Integrating Amazon Bedrock's BrowserLiveView Component in React Applications Enhancing User Trust in AI with Amazon Bedrock's...

Transforming Large-Scale Agent Management: AWS Agent Registry Enters Preview Phase

Introducing AWS Agent Registry: Streamlining AI Agent Management Across Enterprises Overview of Critical Challenges in Agent Management What's Available in Preview Today Finding What Already Exists Governing What...

Walmart Inc. (WMT) — AI-Driven Equity Analysis

Comprehensive Financial Analysis of Walmart Inc. (WMT) Overview of Analytical Framework Report Purpose: Independent analysis based on publicly sourced financial data. Data Integrity: All numbers are verifiable;...