Enhancing AI Agent Observability: Integrating Arize AI with Amazon Bedrock Agents
This article explores the collaboration between Arize AI and Amazon Bedrock Agents to address observability challenges in AI development, emphasizing the capabilities and benefits of using Arize Phoenix for enhanced monitoring and evaluation.
Unlocking the Power of AI Agents: A Deep Dive into Arize AI and Amazon Bedrock Integration
This post is cowritten with John Gilhuly from Arize AI.
In recent years, the rise of AI has transformed how businesses operate, offering unprecedented opportunities for automation and efficiency. One of the most exciting advancements in this realm is the introduction of Amazon Bedrock Agents—powerful tools that enable developers to build and configure autonomous agents tailored for their applications.
What Are Amazon Bedrock Agents?
Amazon Bedrock Agents serve as intelligent intermediaries that assist end-users in performing actions based on organizational data and user input. They facilitate complex interactions between foundation models (FMs), various data sources, software applications, and user conversations.
Besides automating tasks for customers and providing answers to their queries—think processing insurance claims or making travel reservations—Amazon Bedrock handles the heavy lifting. Developers no longer need to manage infrastructure, provision capacity, or dive deep into custom code. With Amazon Bedrock overseeing aspects like prompt engineering, memory management, monitoring, encryption, user permissions, and API invocation, developers can focus on what matters most: delivering high-quality applications.
The Challenge: Observability in AI Agents
As AI agents become central to application decision-making, monitoring their performance becomes crucial. Traditional software systems operate on predetermined paths, but AI agents utilize complex, often opaque reasoning processes. This “black box” nature complicates the task of ensuring reliability and optimal performance.
Observability—a vital aspect of AI operations—has emerged as a significant focus area. It provides critical insights into how your agents perform, interact, and accomplish tasks. The goal? To trace every operation, from high-level requests to the nitty-gritty of API calls.
Introducing Arize AI and Amazon Bedrock Agents Integration
Today, we’re thrilled to announce a robust integration between Arize AI and Amazon Bedrock Agents, tackling one of the most pressing challenges in AI development: observability.
Key Benefits of the Integration
- 
Comprehensive Traceability: Track each step of your agent’s execution, from user queries to knowledge retrieval and action execution. 
- 
Systematic Evaluation Framework: Employ consistent methodologies to measure and glean insights into agent performance. 
- 
Data-Driven Optimization: Run structured experiments, allowing you to compare various agent configurations and pinpoint the most effective settings. 
Available Versions
- Arize AX: An enterprise solution for advanced monitoring capabilities.
- Arize Phoenix: An open-source service that democratizes access to tracing and evaluation for developers.
This post will focus on implementing the Arize Phoenix system for tracing and evaluation, which can seamlessly run on local machines, Jupyter notebooks, containerized environments, or the cloud.
Solution Overview
Tracing is crucial in understanding the paths requests take through an application. By utilizing tracing, developers can gain visibility into the operational health of their applications, making it easier to debug difficult-to-reproduce behaviors.
Instrumentation
For effective trace generation, your application must be instrumented. While manual instrumentation is possible, Arize Phoenix provides a set of plugins for automatic instrumentation, making the entire process straightforward.
Getting Started
To demonstrate how this integration works, you can automatically instrument interactions with Amazon Bedrock or Amazon Bedrock agents. The following high-level overview outlines the setup:
- Prerequisites: Ensure you have necessary libraries installed.
- Environment Configuration: Set up environment variables for Phoenix.
- Session and Agent Setup: Connect to your Amazon Bedrock session using Boto3 and configure your agent.
import boto3
session = boto3.Session()
bedrock_agent_runtime = session.client(service_name="bedrock-agent-runtime")Capturing Agent Output with Tracing Enabled
Creating a function that runs your agent while capturing trace outputs is essential.
@using_metadata(metadata)
def run(input_text):
    response = bedrock_agent_runtime.invoke_agent(**attributes)
    # Stream the responseTest your agent using sample queries, and Phoenix will automatically collect detailed traces.
Viewing Captured Traces
After running your agent, navigate to the Phoenix dashboard for a clear visualization of each agent invocation. You’ll gain insights into:
- Full conversation context
- Knowledge base queries and results
- Decision-making steps of the agent
Evaluating Agent Performance
Evaluating AI agents presents unique challenges, especially in function calling accuracy. The integration offers built-in LLM evaluations and code-based experiment testing, allowing you to measure every component of the agent.
Run evaluations to check how well the agent performs using available tools through the evaluation templates provided by Phoenix.
response_classifications = llm_classify(
    data=trace_df,
    template=TOOL_CALLING_PROMPT_TEMPLATE,
)Log the evaluation results to Phoenix to gain insights into how effectively your agent utilizes its tools.
Conclusion
As AI agents proliferate within enterprise applications, observability remains a cornerstone for ensuring reliability and performance. The integration between Arize AI and Amazon Bedrock Agents equips developers with the necessary tools to create, monitor, and refine AI applications effectively.
We’re excited to see how this integration will empower developers to unlock new possibilities in AI. Stay tuned for further updates on enhancing this integration and its capabilities.
About the Authors
Ishan Singh: A Senior Generative AI Data Scientist at AWS, specializing in building responsible generative AI solutions. Outside of work, he enjoys volleyball and exploring local bike trails.
John Gilhuly: Head of Developer Relations at Arize AI, focused on AI agent observability. With an MBA from Stanford, he has led various go-to-market activities in tech.
For further details, consult the Phoenix documentation and explore how you can leverage this integration for your own applications.