Unlock the Power of Your AI Agents with ToolSimulator: A Comprehensive Guide
Revolutionize Your Testing Process with Strands Evals Toolkit
In This Guide You Will Learn To:
- Set Up ToolSimulator and Register Tools for Simulation
- Configure Stateful Tool Simulations for Multi-Turn Agent Workflows
- Enforce Response Schemas with Pydantic Models
- Integrate ToolSimulator into a Complete Strands Evals Evaluation Pipeline
- Apply Best Practices for Simulation-Based Agent Evaluation
Prerequisites
Why Tool Testing Challenges Your Development Workflow
Why Static Mocks Fall Short
What Makes ToolSimulator Different
How ToolSimulator Works
Getting Started with ToolSimulator
Advanced ToolSimulator Usage
Integration with Strands Evaluation Pipelines
Best Practices for Simulation-Based Evaluation
Conclusion
Next Steps
About The Authors
Efficiently Testing AI Agents with ToolSimulator: A Comprehensive Guide
As AI agents become increasingly sophisticated, the need for robust testing mechanisms is more critical than ever. Enter ToolSimulator, a powerful, LLM-powered simulation framework integrated within Strands Evals. ToolSimulator enables safe and thorough testing of AI agents that rely on external tools, all without the risks associated with live API calls. In this blog post, you’ll learn how to leverage ToolSimulator to enhance your development workflow.
Setting Up ToolSimulator
Before diving into the advanced capabilities of ToolSimulator, ensure you have the necessary prerequisites:
Prerequisites
- Python 3.10 or later installed in your environment
- Strands Evals SDK: Install it using
pip install strands-evals - Basic familiarity with Python, including decorators and type hints
- Understanding of AI agents and tool-calling concepts (API calls, function schemas)
- Pydantic knowledge is helpful for advanced schema examples, but not required
- No AWS account is needed for local use
Why Tool Testing Challenges Your Development Workflow
Modern AI agents interact with a plethora of external systems, which can slow down testing and introduce risk. Here are three notable challenges with live APIs:
- External Dependencies: API rate limits and downtime can hinder testing speed when running multiple test cases.
- Risky Test Isolation: Real tool interactions can lead to unintended side effects, such as sending actual emails or modifying databases.
- Privacy and Security Concerns: Testing with live APIs may expose sensitive data, creating compliance risks.
While static mocks may seem like a solution, they fail in dynamic, multi-turn workflows where responses depend on preceding interactions.
What Makes ToolSimulator Different
ToolSimulator addresses these challenges through three key features:
- Adaptive Response Generation: ToolSimulator generates context-appropriate responses based on actual agent requests, rather than relying on static templates.
- Stateful Workflow Support: It allows for multi-turn workflows by maintaining consistent state across tool calls.
- Schema Enforcement: Responses are validated against Pydantic schemas to catch malformed outputs before they reach the agent.
Getting Started with ToolSimulator
Here’s a step-by-step guide for setting up and running your first simulation using ToolSimulator.
Step 1: Decorate and Register
First, create a ToolSimulator instance and register your tool using the @simulator.tool() decorator.
from strands_evals.simulation.tool_simulator import ToolSimulator
tool_simulator = ToolSimulator()
@tool_simulator.tool()
def search_flights(origin: str, destination: str, date: str) -> dict:
"""Search for available flights."""
pass # Implementation is not called during simulation
Step 2: Steer (Optional Configuration)
Customize simulation behavior with optional parameters:
share_state_id: Link tools sharing the same backend.initial_state_description: Seed initial context for better responses.output_schema: Define expected response structure using Pydantic models.
Step 3: Mock
When your agent calls a registered tool, ToolSimulator intercepts the call and dynamically generates a response that aligns with the agent’s request.
# Create an agent with the simulated tool
agent = Agent(
system_prompt="You are a flight search assistant.",
tools=[tool_simulator.get_tool("search_flights")],
)
response = agent("Find me flights from Seattle to New York on March 15.")
print(response)
Advanced ToolSimulator Usage
Running Independent Instances
Creating multiple ToolSimulator instances allows you to run parallel tests without shared state conflicts:
simulator_a = ToolSimulator()
simulator_b = ToolSimulator()
Configuring Shared State
To handle stateful tools effectively, link them via share_state_id to maintain consistent context.
@tool_simulator.tool(share_state_id="flight_booking")
def get_booking_status(booking_id: str) -> dict:
"""Retrieve current booking status."""
pass
Enforcing Custom Response Schemas
For strict API adherence, define your response schema using Pydantic. This prevents malformed responses from reaching your agent.
class FlightSearchResponse(BaseModel):
flights: list[dict]
origin: str
destination: str
status: str = "success"
@tool_simulator.tool(output_schema=FlightSearchResponse)
def search_flights(origin: str, destination: str, date: str) -> dict:
"""Search for flights."""
pass
Integration with Strands Evaluation Pipelines
ToolSimulator easily integrates within Strands Evals for comprehensive evaluation. Here’s a complete pipeline example:
# Setup telemetry and ToolSimulator
telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
tool_simulator = ToolSimulator()
# Register tools and define evaluation task (as demonstrated above)
# Define test cases, run the experiment, and display the report
Best Practices for Simulation-Based Evaluation
To maximize ToolSimulator’s benefits:
- Use default configurations for broad coverage, overriding only when necessary.
- Provide rich initial state descriptions for realistic simulations.
- Employ
share_state_idfor multi-turn workflows. - Validate sequences of interactions, not just final outputs.
Conclusion
ToolSimulator revolutionizes AI agent testing by eliminating risky live API calls and enabling safe, scalable simulations. This framework equips developers with the tools to catch bugs early and launch production-ready agents with confidence.
Next Steps
Start using ToolSimulator today by installing it:
pip install strands-evals
Explore the Strands Evals documentation for further insights, experiment with examples, and engage with the community for shared learning. Your feedback is invaluable—connect with us on GitHub or the community forums.
About The Authors
Darren Wang, Xuan Qi, Smeet Dhakecha, and Vinayak Arannil are esteemed researchers and engineers at Amazon Web Services dedicated to advancing AI agent technologies. Their collective expertise shapes the future of AI testing and simulation, helping developers build reliable solutions that transform industries.