Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

ToolSimulator: Scalable Testing Solutions for AI Agents

Unlock the Power of Your AI Agents with ToolSimulator: A Comprehensive Guide

Revolutionize Your Testing Process with Strands Evals Toolkit

In This Guide You Will Learn To:

  • Set Up ToolSimulator and Register Tools for Simulation
  • Configure Stateful Tool Simulations for Multi-Turn Agent Workflows
  • Enforce Response Schemas with Pydantic Models
  • Integrate ToolSimulator into a Complete Strands Evals Evaluation Pipeline
  • Apply Best Practices for Simulation-Based Agent Evaluation

Prerequisites

Why Tool Testing Challenges Your Development Workflow

Why Static Mocks Fall Short

What Makes ToolSimulator Different

How ToolSimulator Works

Getting Started with ToolSimulator

Advanced ToolSimulator Usage

Integration with Strands Evaluation Pipelines

Best Practices for Simulation-Based Evaluation

Conclusion

Next Steps

About The Authors

Efficiently Testing AI Agents with ToolSimulator: A Comprehensive Guide

As AI agents become increasingly sophisticated, the need for robust testing mechanisms is more critical than ever. Enter ToolSimulator, a powerful, LLM-powered simulation framework integrated within Strands Evals. ToolSimulator enables safe and thorough testing of AI agents that rely on external tools, all without the risks associated with live API calls. In this blog post, you’ll learn how to leverage ToolSimulator to enhance your development workflow.

Setting Up ToolSimulator

Before diving into the advanced capabilities of ToolSimulator, ensure you have the necessary prerequisites:

Prerequisites

  • Python 3.10 or later installed in your environment
  • Strands Evals SDK: Install it using pip install strands-evals
  • Basic familiarity with Python, including decorators and type hints
  • Understanding of AI agents and tool-calling concepts (API calls, function schemas)
  • Pydantic knowledge is helpful for advanced schema examples, but not required
  • No AWS account is needed for local use

Why Tool Testing Challenges Your Development Workflow

Modern AI agents interact with a plethora of external systems, which can slow down testing and introduce risk. Here are three notable challenges with live APIs:

  1. External Dependencies: API rate limits and downtime can hinder testing speed when running multiple test cases.
  2. Risky Test Isolation: Real tool interactions can lead to unintended side effects, such as sending actual emails or modifying databases.
  3. Privacy and Security Concerns: Testing with live APIs may expose sensitive data, creating compliance risks.

While static mocks may seem like a solution, they fail in dynamic, multi-turn workflows where responses depend on preceding interactions.

What Makes ToolSimulator Different

ToolSimulator addresses these challenges through three key features:

  • Adaptive Response Generation: ToolSimulator generates context-appropriate responses based on actual agent requests, rather than relying on static templates.
  • Stateful Workflow Support: It allows for multi-turn workflows by maintaining consistent state across tool calls.
  • Schema Enforcement: Responses are validated against Pydantic schemas to catch malformed outputs before they reach the agent.

Getting Started with ToolSimulator

Here’s a step-by-step guide for setting up and running your first simulation using ToolSimulator.

Step 1: Decorate and Register

First, create a ToolSimulator instance and register your tool using the @simulator.tool() decorator.

from strands_evals.simulation.tool_simulator import ToolSimulator

tool_simulator = ToolSimulator()

@tool_simulator.tool()
def search_flights(origin: str, destination: str, date: str) -> dict:
    """Search for available flights."""
    pass  # Implementation is not called during simulation

Step 2: Steer (Optional Configuration)

Customize simulation behavior with optional parameters:

  • share_state_id: Link tools sharing the same backend.
  • initial_state_description: Seed initial context for better responses.
  • output_schema: Define expected response structure using Pydantic models.

Step 3: Mock

When your agent calls a registered tool, ToolSimulator intercepts the call and dynamically generates a response that aligns with the agent’s request.

# Create an agent with the simulated tool
agent = Agent(
    system_prompt="You are a flight search assistant.",
    tools=[tool_simulator.get_tool("search_flights")],
)

response = agent("Find me flights from Seattle to New York on March 15.")
print(response)

Advanced ToolSimulator Usage

Running Independent Instances

Creating multiple ToolSimulator instances allows you to run parallel tests without shared state conflicts:

simulator_a = ToolSimulator()
simulator_b = ToolSimulator()

Configuring Shared State

To handle stateful tools effectively, link them via share_state_id to maintain consistent context.

@tool_simulator.tool(share_state_id="flight_booking")
def get_booking_status(booking_id: str) -> dict:
    """Retrieve current booking status."""
    pass

Enforcing Custom Response Schemas

For strict API adherence, define your response schema using Pydantic. This prevents malformed responses from reaching your agent.

class FlightSearchResponse(BaseModel):
    flights: list[dict]
    origin: str
    destination: str
    status: str = "success"

@tool_simulator.tool(output_schema=FlightSearchResponse)
def search_flights(origin: str, destination: str, date: str) -> dict:
    """Search for flights."""
    pass

Integration with Strands Evaluation Pipelines

ToolSimulator easily integrates within Strands Evals for comprehensive evaluation. Here’s a complete pipeline example:

# Setup telemetry and ToolSimulator
telemetry = StrandsEvalsTelemetry().setup_in_memory_exporter()
tool_simulator = ToolSimulator()

# Register tools and define evaluation task (as demonstrated above)
# Define test cases, run the experiment, and display the report

Best Practices for Simulation-Based Evaluation

To maximize ToolSimulator’s benefits:

  • Use default configurations for broad coverage, overriding only when necessary.
  • Provide rich initial state descriptions for realistic simulations.
  • Employ share_state_id for multi-turn workflows.
  • Validate sequences of interactions, not just final outputs.

Conclusion

ToolSimulator revolutionizes AI agent testing by eliminating risky live API calls and enabling safe, scalable simulations. This framework equips developers with the tools to catch bugs early and launch production-ready agents with confidence.

Next Steps

Start using ToolSimulator today by installing it:

pip install strands-evals

Explore the Strands Evals documentation for further insights, experiment with examples, and engage with the community for shared learning. Your feedback is invaluable—connect with us on GitHub or the community forums.


About The Authors

Darren Wang, Xuan Qi, Smeet Dhakecha, and Vinayak Arannil are esteemed researchers and engineers at Amazon Web Services dedicated to advancing AI agent technologies. Their collective expertise shapes the future of AI testing and simulation, helping developers build reliable solutions that transform industries.

Latest

Better Introduces AI Mortgage Decision Engine Within ChatGPT

Better Launches AI-Powered Credit Decision Engine in Partnership with...

How Physical AI is Revolutionizing Robotics in Various Industries

Transforming Robotics: How Physical AI is Revolutionizing the Interaction...

Acuity Expands AI Integration with Trade247 Following FP Markets Agreement — TradingView News

Trade247 Integrates AI-Driven Market Intelligence from Acuity Trading to...

You Don’t Have an AI Issue; You Have a Skill Gap.

Bridging the AI Adoption and Skills Gap in the...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Nova Forge SDK Series, Part 2: A Practical Guide to Fine-Tuning...

Fine-Tuning an Amazon Nova Model: A Hands-On Guide with Data Mixing Techniques This guide provides a comprehensive overview for fine-tuning Amazon Nova models using the...

Enhance Video Semantic Search Using Amazon Nova Multimodal Embeddings

Unlocking the Power of Video Semantic Search: Enhancing Content Delivery Across Industries Introduction to Video Semantic Search Video semantic search is unlocking new value across industries,...

Unveiling Detailed Cost Attribution for Amazon Bedrock

Understanding Granular Cost Attribution for Amazon Bedrock Inference: A Guide to Tracking and Optimizing Cloud Expenses Key Takeaways: Overview of Amazon Bedrock’s new cost attribution feature How...