Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Evaluating AI Agent Workflows Using Strands Agents SDK and Arize AX

Enhancing Trust and Efficiency in AI Workflows with Arize AX and Strands Agents

Co-Authored Insights from Rich Young at Arize AI


Understanding the Challenges of Generative AI Applications

Introducing Arize AX: A Comprehensive Observability and Evaluation Framework

The Synergy of Arize AX and Strands Agents

Step-by-Step Guide: Leveraging Arize AX with Strands Agents

Prerequisites for Integration

Solution Walkthrough: Utilizing Arize AX with Strands Agents

Comprehensive Agent Performance Monitoring and Optimization

Conclusion: Building Reliable AI Solutions with Arize and Strands

Get Started with Your AI Agent Journey

About the Authors: Experts Behind the Insights

Navigating the Landscape of Agentic AI: Harnessing Arize AX and Strands Agents for Reliable Workflows

This post is co-written with Rich Young from Arize AI.

In the rapidly evolving realm of artificial intelligence, a new wave of applications is emerging—agentic AI. Unlike traditional AI workloads, agentic AI applications are fundamentally nondeterministic, meaning they can yield varying results even with identical inputs. This unpredictability is primarily due to the probabilistic nature of large language models (LLMs) that underpin these systems. As AI application designers explore agentic workflows, they face pressing questions about the right corrective actions, optimal decision paths, and appropriate tools and parameters.

To successfully deploy such agentic workloads, organizations require an observability system capable of ensuring dependable, trustworthy outcomes. Enter the Arize AX service, which aids in tracing and evaluating AI agent tasks initiated through Strands Agents, thus validating the correctness and reliability of agentic workflows.

Challenges with Generative AI Applications

Transitioning from a promising AI demonstration to a dependable production system can be fraught with unforeseen challenges. Many organizations underestimate these obstacles, which can significantly hamper the effectiveness of their projects. Based on extensive industry research and real-world deployments, teams encounter several crucial hurdles:

  • Unpredictable Behavior at Scale: AI agents that excel in controlled testing environments may falter in production when faced with unexpected inputs, such as new language variations or niche jargon that leads to irrelevant outputs.

  • Hidden Failure Modes: Agents may produce seemingly plausible outputs that are actually incorrect, potentially leading to misguided decisions based on miscalculations or incorrect conclusions.

  • Nondeterministic Paths: Agents can take less efficient or incorrect paths in decision-making—such as taking unnecessarily long routes to resolve queries—which can degrade user experience.

  • Tool Integration Complexity: Errors in API calls—such as incompatible order ID formats—can result in silent failures, inhibiting essential functions like refunds.

  • Cost and Performance Variability: Inefficiencies may lead to runaway costs, such as an agent making excessive LLM calls that dramatically increase response times from seconds to minutes.

These challenges render traditional monitoring and testing strategies inadequate for AI systems. A more nuanced, comprehensive approach is necessary for success.

Introducing Arize AX: Comprehensive Observability and Evaluation

The Arize AX service is designed to bridge these gaps, providing a robust framework for monitoring, evaluation, and debugging of AI applications throughout their development and production lifecycles. Utilizing Arize’s Phoenix foundation, AX incorporates essential enterprise features like the "Alyx" AI assistant, automatic prompt optimization, and role-based access control (RBAC), enabling effective management of AI agents.

Key capabilities of Arize AX include:

  • Tracing: Offers thorough visibility into LLM operations, tracking model calls, retrieval steps, and metadata.

  • Evaluation: Automated quality monitoring with LLM-as-a-judge evaluations, permitting custom evaluations and clear success metrics.

  • Datasets: Maintains versioned, representative datasets, allowing for regression tests and edge case analysis.

  • Experiments: Facilitates controlled tests to assess the impact of changes to prompts or models.

  • Playground: An interactive environment for replaying traces and testing prompt variations.

  • Prompt Management: Enables versioning and tracking of prompts, akin to code.

  • Monitoring and Alerting: Real-time dashboards and alerts for various performance metrics.

  • Agent Visualization: Analyzes and refines decision paths to improve efficiency and effectiveness.

Together, these components form a comprehensive observability strategy, treating LLM applications as mission-critical systems requiring continuous oversight and improvement.

The Synergy of Arize AX and Strands Agents

Strands Agents is an open-source SDK designed to simplify the development and operation of AI agents by minimizing overhead. This low-code framework streamlines workflows by unifying prompts, tools, LLM interactions, and integration protocols.

In this section, we outline how to build an agent using the Strands Agent SDK, equipping it with Arize AX for efficient trace-based evaluation and optimization.

High-Level Workflow for Building an Agent:

  1. Install and Configure the Dependencies
  2. Instrument the Agent for Observability
  3. Build the Agent with the Strands SDK
  4. Test the Agent and Generate Traces
  5. Analyze Traces in Arize AI
  6. Evaluate the Agent’s Behavior
  7. Optimize the Agent
  8. Continually Monitor the Agent

Prerequisites

To get started, you’ll need:

  • An AWS account with Amazon Bedrock access.
  • An Arize account with your Space ID and API Key (easily obtainable at arize.com).

Install the necessary dependencies:

pip install strands opentelemetry-sdk arize-otel

Solution Walkthrough: Utilizing Arize AX with Strands Agents

1. Install and Configure Dependencies

To install and configure your dependencies, use the following code snippet:

from opentelemetry import trace
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from strands_to_openinference_mapping import StrandsToOpenInferenceProcessor
from arize.otel import register
import grpc

2. Instrument the Agent for Observability

Make use of Arize’s OpenTelemetry integration to enable tracing. The StrandsToOpenInferenceProcessor will provide the necessary conversion of spans to the required format.

register(
    space_id="your-arize-space-id",
    api_key="your-arize-api-key",
    project_name="strands-project",
    processor=StrandsToOpenInferenceProcessor()
)

3. Build the Agent with Strands SDK

Define the Restaurant Assistant agent, which helps customers with restaurant information and reservations:

from strands import Agent
from strands.models.bedrock import BedrockModel
import boto3

system_prompt = """You are 'Restaurant Helper', assisting with restaurant bookings."""
model = BedrockModel(model_id="us.anthropic.claude-3-7-sonnet-20250219-v1:0")
agent = Agent(
    model=model,
    system_prompt=system_prompt,
    tools=[retrieve, current_time, get_booking_details, create_booking, delete_booking],
    trace_attributes={"session.id": "abc-1234", "user.id": "user-email@example.com"}
)

4. Test the Agent and Generate Traces

Test the functionality of the agent with specific queries, which will trigger trace generation:

results = agent("Hi, where can I eat in New York?")
print(results)

5. Analyze Traces in Arize AI

After running queries through the agent, access the traces within the Arize AI dashboard to evaluate decision paths and performance.

6. Evaluate the Agent’s Behavior

Arize’s system will capture the agent’s decision-making process, providing pre-built evaluation templates for optimal paths and outputs.

7. Optimize the Agent

Use Arize’s prompt playground to experiment with various model parameters and prompts, helping to improve agent responses iteratively.

8. Continually Monitor the Agent

Implement monitoring capabilities to maintain performance and catch issues early by tracking critical metrics such as latency and token usage.

Conclusion

In today’s AI landscape, observability, automatic evaluations, experimentation, and proactive alerting are no longer optional; they are crucial for ensuring the reliability of AI systems. Organizations that prioritize proper AI operations infrastructure can unlock the full potential of AI agents while avoiding early missteps.

By combining the user-friendly Strands Agents framework with Arize AI’s robust monitoring and evaluation tools, organizations can create effective, trustworthy AI applications. With the right tools and strategies in place, the future of agentic workflows looks promising.

Get Started

Ready to elevate your AI projects? Sign up at arize.com, integrate with Strands Agents, and begin crafting reliable, production-ready AI solutions today.

About the Authors

Rich Young is the Director of Partner Solutions Architecture at Arize AI, focused on AI agent observability.

Karan Singh is an Agentic AI leader at AWS, specializing in agentic frameworks.

Nolan Chen is a Partner Solutions Architect at AWS, aiding startups in building innovative cloud solutions.

Venu Kanamatareddy is an AI/ML Solutions Architect at AWS, supporting startups in AI-driven projects.

Latest

Exploitation of ChatGPT via SSRF Vulnerability in Custom GPT Actions

Addressing SSRF Vulnerabilities: OpenAI's Patch and Essential Security Measures...

This Startup Is Transforming Touch Technology for VR, Robotics, and Beyond

Sensetics: Pioneering Programmable Matter to Digitize the Sense of...

Leveraging Artificial Intelligence in Education and Scientific Research

Unlocking the Future of Learning: An Overview of Humata...

European Commission Violates Its Own AI Guidelines by Utilizing ChatGPT in Public Documents

ICCL Files Complaint Against European Commission Over Generative AI...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

A Comprehensive Guide to Developing AI Agents in GxP Environments

Leveraging Generative AI in GxP-Compliant Healthcare: Key Strategies and Frameworks Transforming Healthcare with AI in Regulatory Environments A Risk-Based Framework for Implementing AI Agents in GxP...

Introducing Agent-to-Agent Protocol Support in Amazon Bedrock’s AgentCore Runtime

Unlocking Seamless Collaboration: Introducing Agent-to-Agent (A2A) Protocol on Amazon Bedrock AgentCore Runtime Maximize Efficiency and Interoperability in Multi-Agent Systems Explore how Amazon Bedrock AgentCore Runtime empowers...

Optimize VLMs for Multipage Document-to-JSON Conversion Using SageMaker AI and SWIFT

Leveraging Intelligent Document Processing: Unleashing the Power of Vision Language Models for Accurate Document-to-JSON Conversion Overview of Intelligent Document Processing Challenges and Solutions Advancements in Document...