Overcoming Context Window Limitations in Document Analysis Using Recursive Language Models
Unlocking Insights Beyond Context Boundaries: A Guide to Recursive Language Models
Introduction to the Challenge of Context Windows
Understanding Recursive Language Models (RLM)
Architectural Framework for RLM Implementation
Step-by-Step Guide to Implementing RLM
Pre-Conditions for Successful RLM Implementation
Evaluation: Effectiveness of RLM Compared to Traditional Methods
Scaling RLM for Code Repository Analysis
Real-World Application: RLM in Action
Key Trade-offs and Considerations for RLM Adoption
Conclusion: Enhancing Document Analysis with RLM
References
About the Authors
Unlocking Document Analysis with Recursive Language Models
When dealing with vast documents that stretch across millions of characters, traditional language models (LMs) often struggle with context window limitations. Many find themselves facing the reality that even the largest context windows fall short. This leads to either rejection of the input or incomplete answers due to insufficient context. So, how do you effectively analyze documents that exceed typical limitations?
In this post, we’ll explore how to implement Recursive Language Models (RLMs) using Amazon Bedrock AgentCore Code Interpreter and the Strands Agents SDK. By the end, you will be equipped with knowledge to:
- Process documents of varying lengths without any upper context size limitation.
- Utilize the Bedrock AgentCore Code Interpreter as a persistent working memory for iterative document analysis.
- Organize sub-large language model (sub-LLM) calls within a sandboxed Python environment to analyze specific sections of documents.
Why Context Windows Aren’t Enough
Imagine you’re analyzing financial data across two annual reports from a single company—each report between 300 to 500 pages long. Now add analyst reports, SEC filings, and supplementary materials, and you’re looking at millions of characters. When fed directly into a model, there are two potential pitfalls:
- The input might exceed the model’s context window limit, resulting in an error.
- The input fits but the model struggles to focus on critical information located in the middle of long inputs—this phenomenon is known as the “lost in the middle” problem.
These issues highlight the fact that context window size is a hard limit that cannot be circumvented solely through prompt engineering. A new approach that decouples document size from the model’s context window is essential.
RLMs: Treating Context as an Environment
Recursive Language Models (as introduced by Zhang et al. in their paper, arXiv:2512.24601), reinterpret context interaction. Instead of feeding the entire document into the model’s context, RLMs consider the input as an external environment that the model can interact with programmatically.
How RLMs Work
- Orchestration: The root LLM generates code to explore the document environment.
- Delegation: It delegates semantic analysis to sub-LLMs for specific chunks.
- Accumulation: Results are stored in a working memory, refining the analysis step-by-step.
This structure allows the root LLM to manage the analysis without ever needing the full document in its context window.
Architecture Overview
The implementation of RLM using Amazon Bedrock AgentCore Code Interpreter involves three main components:
- Root LLM Agent: Built with Strands Agents SDK, it receives user queries and decides on the code execution.
- Amazon Bedrock AgentCore Code Interpreter: Operates in public network mode, keeping the full document as a Python variable.
- Sub-LLM Calls: The root LLM can call sub-LLMs directly from within the Code Interpreter, allowing the results to remain in Python variables.
Diagram of RLM Architecture
(Illustrative figure concept)
The architecture leverages the persistent session state of the Code Interpreter allowing cumulative intermediate results and extracted data.
Implementation Steps
To get started, ensure you meet a few prerequisites:
- An AWS account with access to Amazon Bedrock foundation models.
- Python 3.10 or later.
- Configured AWS Command Line Interface.
- IAM permissions for necessary Bedrock functions.
Step 1: Initiate a Code Interpreter Session
import boto3
import json
client = boto3.client('bedrock-agentcore', region_name="us-east-1")
response = client.start_code_interpreter_session(
codeInterpreterIdentifier=code_interpreter_id,
name="rlm-session",
sessionTimeoutSeconds=3600
)
session_id = response["sessionId"]
client.invoke_code_interpreter(
codeInterpreterIdentifier=code_interpreter_id,
sessionId=session_id,
name="writeFiles",
arguments={"content": [{"path": "_context.txt", "text": document}]}
)
Step 2: Define the llm_query Helper Within the Sandbox
with open('_context.txt', 'r') as f:
context = f.read()
def llm_query(prompt: str) -> str:
response = bedrock_client.invoke_model(
modelId=sub_model_id,
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 4096,
"messages": [{"role": "user", "content": prompt}]
})
)
result = json.loads(response['body'].read())
return result['content'][0]['text']
Step 3: Create a Strands Agent and Run Your Query
from strands import Agent
agent = Agent(
model="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
system_prompt=rlm_system_prompt,
tools=[execute_python],
)
answer = agent("What are the key revenue trends across these reports?")
Through this agent, the model iteratively writes and executes code to explore the loaded document.
Evaluation of RLM Effectiveness
In our evaluations, RLM outperformed traditional models significantly on Financial Multi-Document QA tasks, showing a 100% success rate while reducing input limit errors.
| Model | Approach | Success Rate | Accuracy |
|---|---|---|---|
| Claude Haiku 4.5 + Haiku 4.5 | RLM | 100% | 66.7% |
The RLM architecture not only improves success rates but also boosts accuracy significantly by effectively breaking down complex tasks.
Conclusion
Recursive Language Models present a robust solution for processing large documents that exceed standard model context limits. By leveraging Amazon Bedrock AgentCore Code Interpreter alongside the Strands Agents SDK, RLM can effectively analyze and reason over extensive input data.
This approach is beneficial not just for financial analyses or document reviews, but across multiple domains including healthcare, legal review, and programming tasks.
If you implement this methodology in your projects, share your experiences! Your insights could enrich the conversation surrounding advanced document analysis strategies. Let’s innovate together!