Overcoming Context Window Limitations in Document Analysis Using Recursive Language Models

Unlocking Insights Beyond Context Boundaries: A Guide to Recursive Language Models

Introduction to the Challenge of Context Windows

Understanding Recursive Language Models (RLM)

Architectural Framework for RLM Implementation

Step-by-Step Guide to Implementing RLM

Pre-Conditions for Successful RLM Implementation

Evaluation: Effectiveness of RLM Compared to Traditional Methods

Scaling RLM for Code Repository Analysis

Real-World Application: RLM in Action

Key Trade-offs and Considerations for RLM Adoption

Conclusion: Enhancing Document Analysis with RLM

References

About the Authors

Unlocking Document Analysis with Recursive Language Models

When dealing with vast documents that stretch across millions of characters, traditional language models (LMs) often struggle with context window limitations. Many find themselves facing the reality that even the largest context windows fall short. This leads to either rejection of the input or incomplete answers due to insufficient context. So, how do you effectively analyze documents that exceed typical limitations?

In this post, we’ll explore how to implement Recursive Language Models (RLMs) using Amazon Bedrock AgentCore Code Interpreter and the Strands Agents SDK. By the end, you will be equipped with knowledge to:

Process documents of varying lengths without any upper context size limitation.
Utilize the Bedrock AgentCore Code Interpreter as a persistent working memory for iterative document analysis.
Organize sub-large language model (sub-LLM) calls within a sandboxed Python environment to analyze specific sections of documents.

Why Context Windows Aren’t Enough

Imagine you’re analyzing financial data across two annual reports from a single company—each report between 300 to 500 pages long. Now add analyst reports, SEC filings, and supplementary materials, and you’re looking at millions of characters. When fed directly into a model, there are two potential pitfalls:

The input might exceed the model’s context window limit, resulting in an error.
The input fits but the model struggles to focus on critical information located in the middle of long inputs—this phenomenon is known as the “lost in the middle” problem.

These issues highlight the fact that context window size is a hard limit that cannot be circumvented solely through prompt engineering. A new approach that decouples document size from the model’s context window is essential.

RLMs: Treating Context as an Environment

Recursive Language Models (as introduced by Zhang et al. in their paper, arXiv:2512.24601), reinterpret context interaction. Instead of feeding the entire document into the model’s context, RLMs consider the input as an external environment that the model can interact with programmatically.

How RLMs Work

Orchestration: The root LLM generates code to explore the document environment.
Delegation: It delegates semantic analysis to sub-LLMs for specific chunks.
Accumulation: Results are stored in a working memory, refining the analysis step-by-step.

This structure allows the root LLM to manage the analysis without ever needing the full document in its context window.

Architecture Overview

The implementation of RLM using Amazon Bedrock AgentCore Code Interpreter involves three main components:

Root LLM Agent: Built with Strands Agents SDK, it receives user queries and decides on the code execution.
Amazon Bedrock AgentCore Code Interpreter: Operates in public network mode, keeping the full document as a Python variable.
Sub-LLM Calls: The root LLM can call sub-LLMs directly from within the Code Interpreter, allowing the results to remain in Python variables.

Diagram of RLM Architecture

(Illustrative figure concept)

The architecture leverages the persistent session state of the Code Interpreter allowing cumulative intermediate results and extracted data.

Implementation Steps

To get started, ensure you meet a few prerequisites:

An AWS account with access to Amazon Bedrock foundation models.
Python 3.10 or later.
Configured AWS Command Line Interface.
IAM permissions for necessary Bedrock functions.

Step 1: Initiate a Code Interpreter Session

import boto3
import json

client = boto3.client('bedrock-agentcore', region_name="us-east-1")
response = client.start_code_interpreter_session(
    codeInterpreterIdentifier=code_interpreter_id,
    name="rlm-session",
    sessionTimeoutSeconds=3600
)
session_id = response["sessionId"]

client.invoke_code_interpreter(
    codeInterpreterIdentifier=code_interpreter_id,
    sessionId=session_id,
    name="writeFiles",
    arguments={"content": [{"path": "_context.txt", "text": document}]}
)

Step 2: Define the llm_query Helper Within the Sandbox

with open('_context.txt', 'r') as f:
    context = f.read()

def llm_query(prompt: str) -> str:
    response = bedrock_client.invoke_model(
        modelId=sub_model_id,
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 4096,
            "messages": [{"role": "user", "content": prompt}]
        })
    )
    result = json.loads(response['body'].read())
    return result['content'][0]['text']

Step 3: Create a Strands Agent and Run Your Query

from strands import Agent

agent = Agent(
    model="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    system_prompt=rlm_system_prompt,
    tools=[execute_python],
)

answer = agent("What are the key revenue trends across these reports?")

Through this agent, the model iteratively writes and executes code to explore the loaded document.

Evaluation of RLM Effectiveness

In our evaluations, RLM outperformed traditional models significantly on Financial Multi-Document QA tasks, showing a 100% success rate while reducing input limit errors.

Model	Approach	Success Rate	Accuracy
Claude Haiku 4.5 + Haiku 4.5	RLM	100%	66.7%

The RLM architecture not only improves success rates but also boosts accuracy significantly by effectively breaking down complex tasks.

Conclusion

Recursive Language Models present a robust solution for processing large documents that exceed standard model context limits. By leveraging Amazon Bedrock AgentCore Code Interpreter alongside the Strands Agents SDK, RLM can effectively analyze and reason over extensive input data.

This approach is beneficial not just for financial analyses or document reviews, but across multiple domains including healthcare, legal review, and programming tasks.

If you implement this methodology in your projects, share your experiences! Your insights could enrich the conversation surrounding advanced document analysis strategies. Let’s innovate together!

Exclusive Content:

Overcome the Context Window Limitation with Amazon Bedrock AgentCore