Enhancing Trust and Efficiency in AI Workflows with Arize AX and Strands Agents
Co-Authored Insights from Rich Young at Arize AI
Understanding the Challenges of Generative AI Applications
Introducing Arize AX: A Comprehensive Observability and Evaluation Framework
The Synergy of Arize AX and Strands Agents
Step-by-Step Guide: Leveraging Arize AX with Strands Agents
Prerequisites for Integration
Solution Walkthrough: Utilizing Arize AX with Strands Agents
Comprehensive Agent Performance Monitoring and Optimization
Conclusion: Building Reliable AI Solutions with Arize and Strands
Get Started with Your AI Agent Journey
About the Authors: Experts Behind the Insights
Navigating the Landscape of Agentic AI: Harnessing Arize AX and Strands Agents for Reliable Workflows
This post is co-written with Rich Young from Arize AI.
In the rapidly evolving realm of artificial intelligence, a new wave of applications is emerging—agentic AI. Unlike traditional AI workloads, agentic AI applications are fundamentally nondeterministic, meaning they can yield varying results even with identical inputs. This unpredictability is primarily due to the probabilistic nature of large language models (LLMs) that underpin these systems. As AI application designers explore agentic workflows, they face pressing questions about the right corrective actions, optimal decision paths, and appropriate tools and parameters.
To successfully deploy such agentic workloads, organizations require an observability system capable of ensuring dependable, trustworthy outcomes. Enter the Arize AX service, which aids in tracing and evaluating AI agent tasks initiated through Strands Agents, thus validating the correctness and reliability of agentic workflows.
Challenges with Generative AI Applications
Transitioning from a promising AI demonstration to a dependable production system can be fraught with unforeseen challenges. Many organizations underestimate these obstacles, which can significantly hamper the effectiveness of their projects. Based on extensive industry research and real-world deployments, teams encounter several crucial hurdles:
-
Unpredictable Behavior at Scale: AI agents that excel in controlled testing environments may falter in production when faced with unexpected inputs, such as new language variations or niche jargon that leads to irrelevant outputs.
-
Hidden Failure Modes: Agents may produce seemingly plausible outputs that are actually incorrect, potentially leading to misguided decisions based on miscalculations or incorrect conclusions.
-
Nondeterministic Paths: Agents can take less efficient or incorrect paths in decision-making—such as taking unnecessarily long routes to resolve queries—which can degrade user experience.
-
Tool Integration Complexity: Errors in API calls—such as incompatible order ID formats—can result in silent failures, inhibiting essential functions like refunds.
-
Cost and Performance Variability: Inefficiencies may lead to runaway costs, such as an agent making excessive LLM calls that dramatically increase response times from seconds to minutes.
These challenges render traditional monitoring and testing strategies inadequate for AI systems. A more nuanced, comprehensive approach is necessary for success.
Introducing Arize AX: Comprehensive Observability and Evaluation
The Arize AX service is designed to bridge these gaps, providing a robust framework for monitoring, evaluation, and debugging of AI applications throughout their development and production lifecycles. Utilizing Arize’s Phoenix foundation, AX incorporates essential enterprise features like the "Alyx" AI assistant, automatic prompt optimization, and role-based access control (RBAC), enabling effective management of AI agents.
Key capabilities of Arize AX include:
-
Tracing: Offers thorough visibility into LLM operations, tracking model calls, retrieval steps, and metadata.
-
Evaluation: Automated quality monitoring with LLM-as-a-judge evaluations, permitting custom evaluations and clear success metrics.
-
Datasets: Maintains versioned, representative datasets, allowing for regression tests and edge case analysis.
-
Experiments: Facilitates controlled tests to assess the impact of changes to prompts or models.
-
Playground: An interactive environment for replaying traces and testing prompt variations.
-
Prompt Management: Enables versioning and tracking of prompts, akin to code.
-
Monitoring and Alerting: Real-time dashboards and alerts for various performance metrics.
-
Agent Visualization: Analyzes and refines decision paths to improve efficiency and effectiveness.
Together, these components form a comprehensive observability strategy, treating LLM applications as mission-critical systems requiring continuous oversight and improvement.
The Synergy of Arize AX and Strands Agents
Strands Agents is an open-source SDK designed to simplify the development and operation of AI agents by minimizing overhead. This low-code framework streamlines workflows by unifying prompts, tools, LLM interactions, and integration protocols.
In this section, we outline how to build an agent using the Strands Agent SDK, equipping it with Arize AX for efficient trace-based evaluation and optimization.
High-Level Workflow for Building an Agent:
- Install and Configure the Dependencies
- Instrument the Agent for Observability
- Build the Agent with the Strands SDK
- Test the Agent and Generate Traces
- Analyze Traces in Arize AI
- Evaluate the Agent’s Behavior
- Optimize the Agent
- Continually Monitor the Agent
Prerequisites
To get started, you’ll need:
- An AWS account with Amazon Bedrock access.
- An Arize account with your Space ID and API Key (easily obtainable at arize.com).
Install the necessary dependencies:
pip install strands opentelemetry-sdk arize-otel
Solution Walkthrough: Utilizing Arize AX with Strands Agents
1. Install and Configure Dependencies
To install and configure your dependencies, use the following code snippet:
from opentelemetry import trace
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from strands_to_openinference_mapping import StrandsToOpenInferenceProcessor
from arize.otel import register
import grpc
2. Instrument the Agent for Observability
Make use of Arize’s OpenTelemetry integration to enable tracing. The StrandsToOpenInferenceProcessor will provide the necessary conversion of spans to the required format.
register(
space_id="your-arize-space-id",
api_key="your-arize-api-key",
project_name="strands-project",
processor=StrandsToOpenInferenceProcessor()
)
3. Build the Agent with Strands SDK
Define the Restaurant Assistant agent, which helps customers with restaurant information and reservations:
from strands import Agent
from strands.models.bedrock import BedrockModel
import boto3
system_prompt = """You are 'Restaurant Helper', assisting with restaurant bookings."""
model = BedrockModel(model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0")
agent = Agent(
model=model,
system_prompt=system_prompt,
tools=[retrieve, current_time, get_booking_details, create_booking, delete_booking],
trace_attributes={"session.id": "abc-1234", "user.id": "user-email@example.com"}
)
4. Test the Agent and Generate Traces
Test the functionality of the agent with specific queries, which will trigger trace generation:
results = agent("Hi, where can I eat in New York?")
print(results)
5. Analyze Traces in Arize AI
After running queries through the agent, access the traces within the Arize AI dashboard to evaluate decision paths and performance.
6. Evaluate the Agent’s Behavior
Arize’s system will capture the agent’s decision-making process, providing pre-built evaluation templates for optimal paths and outputs.
7. Optimize the Agent
Use Arize’s prompt playground to experiment with various model parameters and prompts, helping to improve agent responses iteratively.
8. Continually Monitor the Agent
Implement monitoring capabilities to maintain performance and catch issues early by tracking critical metrics such as latency and token usage.
Conclusion
In today’s AI landscape, observability, automatic evaluations, experimentation, and proactive alerting are no longer optional; they are crucial for ensuring the reliability of AI systems. Organizations that prioritize proper AI operations infrastructure can unlock the full potential of AI agents while avoiding early missteps.
By combining the user-friendly Strands Agents framework with Arize AI’s robust monitoring and evaluation tools, organizations can create effective, trustworthy AI applications. With the right tools and strategies in place, the future of agentic workflows looks promising.
Get Started
Ready to elevate your AI projects? Sign up at arize.com, integrate with Strands Agents, and begin crafting reliable, production-ready AI solutions today.
About the Authors
Rich Young is the Director of Partner Solutions Architecture at Arize AI, focused on AI agent observability.
Karan Singh is an Agentic AI leader at AWS, specializing in agentic frameworks.
Nolan Chen is a Partner Solutions Architect at AWS, aiding startups in building innovative cloud solutions.
Venu Kanamatareddy is an AI/ML Solutions Architect at AWS, supporting startups in AI-driven projects.