Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Speed Up Enterprise AI Development with Weights & Biases and Amazon Bedrock AgentCore

Accelerating Generative AI Adoption: Leveraging Amazon Bedrock and W&B Weave for Enterprise Solutions

Co-authored by Thomas Capelle and Ray Strickland from Weights & Biases (W&B)


Discover how organizations can enhance their AI capabilities with advanced tools for development, evaluation, and monitoring.

Accelerating Enterprise AI with W&B and Amazon Bedrock

This post is co-written by Thomas Capelle and Ray Strickland from Weights & Biases (W&B).

Generative artificial intelligence (AI) is rapidly evolving, moving from basic foundational model interactions to advanced workflows that are becoming integral in enterprise operations. As organizations transition from proof-of-concept stages to full-scale deployments, robust tools for the development, evaluation, and monitoring of AI applications are essential.

In this post, we showcase how to leverage Foundation Models (FMs) from Amazon Bedrock along with the recently launched Amazon Bedrock AgentCore, in conjunction with W&B Weave to build, evaluate, and monitor enterprise AI solutions effectively. We will cover the entire development lifecycle, from tracking individual FM calls to overseeing complex agent workflows in production.

Overview of W&B Weave

Weights & Biases (W&B) is an innovative developer system providing essential tools for training models, fine-tuning, and utilizing foundation models suited for enterprises across a plethora of industries.

Key Features of W&B Weave

  • Tracing & Monitoring: Track large language model (LLM) calls and application logic to debug and analyze your production systems efficiently.
  • Systematic Iteration: Refine prompts, datasets, and models seamlessly.
  • Experimentation: Test various models and prompts using the LLM Playground.
  • Evaluation: Utilize custom or pre-built evaluation tools to assess application performance rigorously, and gather real-world user and expert feedback.
  • Guardrails: Safeguard your application with content moderation and prompt safety tools, utilizing both custom and third-party guardrails, including those from Amazon Bedrock.

W&B Weave is available in both multi-tenant and single-tenant environments and can also be deployed directly in a customer’s Amazon Virtual Private Cloud (VPC). Through its integration with the W&B Development Platform, it offers an uninterrupted experience between model training workflows and agentic AI workflows.

To get started, subscribe to the W&B AI Development Platform on AWS Marketplace, which is free for individuals and academic teams.

Tracking Amazon Bedrock FMs with W&B Weave SDK

W&B Weave integrates smoothly with Amazon Bedrock via Python and TypeScript SDKs. With just a few lines of code, you can automatically track LLM calls. Here’s a quick setup:

!pip install weave
import weave
import boto3
import json
from weave.integrations.bedrock.bedrock_sdk import patch_client

weave.init("my_bedrock_app")

client = boto3.client("bedrock-runtime")
patch_client(client)

response = client.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 100,
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ]
    }),
    contentType="application/json",
    accept="application/json"
)

response_dict = json.loads(response.get('body').read())
print(response_dict["content"][0]["text"])

This integration not only automates versioning and tracks configurations but also provides complete visibility into your applications without needing to alter existing logic.

Experimenting with Amazon Bedrock FMs in W&B Weave Playground

The W&B Weave Playground is designed to expedite prompt engineering through an intuitive interface that facilitates testing and comparison of Bedrock models. Notable features include:

  • Directly editing prompts and retrying messages.
  • Side-by-side model comparisons.
  • Quick iterations accessible from trace views.

To dive in, simply add your AWS credentials in the Playground settings, select your Amazon Bedrock FMs, and begin exploring. The platform enables rapid experimentation while ensuring full traceability of your efforts.

Evaluating Amazon Bedrock FMs with W&B Weave Evaluations

Effective evaluation of generative AI models is simplified with W&B Weave Evaluations. By utilizing these tools with Amazon Bedrock, users can evaluate models, analyze outputs, and visualize performance across various metrics. Leveraging built-in scorers, third-party or custom scorers, and expert feedback allows for a better understanding of model trade-offs such as cost, accuracy, and output quality.

Setting Up an Evaluation Job

Here’s an example of how to orchestrate an evaluation job:

import weave
from weave import Evaluation
import asyncio

# Define examples for evaluation
examples = [
    {"question": "What is the capital of France?", "expected": "Paris"},
    {"question": "Who wrote 'To Kill a Mockingbird'?", "expected": "Harper Lee"},
]

@weave.op()
def match_score1(expected: str, output: dict) -> dict:
    return {'match': expected == output['generated_text']}

@weave.op()
def function_to_evaluate(question: str):
    return {'generated_text': 'Paris'}

evaluation = Evaluation(
    dataset=examples, scorers=[match_score1]
)

weave.init('intro-example')
asyncio.run(evaluation.evaluate(function_to_evaluate))

The evaluation dashboard visualizes performance metrics, guiding informed decisions regarding model selection and configurations.

Enhancing Amazon Bedrock AgentCore Observability with W&B Weave

Amazon Bedrock AgentCore provides enterprise-scale operational controls and runtime environments. It integrates seamlessly with popular frameworks to ensure secure and efficient agent operations. Built-in observability through Amazon CloudWatch helps track essential metrics while W&B Weave enhances this capability with detailed execution visualizations.

Integrating W&B Weave Observability

There are two effective ways to add observability to your AgentCore agents:

  1. Native W&B Weave SDK: By using the @weave.op decorator, tracking function calls becomes automatic.
import weave
import os

os.environ["WANDB_API_KEY"] = "your_api_key"
weave.init("your_project_name")

@weave.op()
def word_count_op(text: str) -> int:
    return len(text.split())

@weave.op()
def run_agent(agent: Agent, user_message: str) -> dict:
    result = agent(user_message)
    return {"message": result.message, "model": agent.model.config["model_id"]}
  1. OpenTelemetry Integration: For users with existing OpenTelemetry infrastructures, W&B Weave supports OTLP for seamless integration.
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

exporter = OTLPSpanExporter(...)

with tracer.start_as_current_span("invoke_agent") as span:
    ...

This dual observability approach allows teams to utilize both CloudWatch and W&B Weave, giving them the flexibility to monitor production SLAs, debug complex behaviors, and optimize performance across their workflows.

Conclusion

In this post, we’ve illustrated how to effectively build and optimize enterprise-grade agentic AI solutions by integrating Amazon Bedrock’s FMs and AgentCore with W&B Weave’s observability toolkit. We explored the extensive capabilities of W&B Weave to enhance the LLM development lifecycle from experimentation to evaluation and production monitoring.

Key Takeaways:

  • Automatic Tracking: Easily track Amazon Bedrock FM calls with minimal code adjustments using the W&B Weave SDK.
  • Rapid Experimentation: Engage with the Playground for quick testing and model comparisons.
  • Systematic Evaluation: Compare model performance through the Evaluation framework.
  • Comprehensive Observability: Blend CloudWatch monitoring with W&B Weave’s visualization tools for enhanced operational insights.

To kickstart your journey:

  1. Request a free trial or subscribe to W&B’s AI Development Platform via AWS Marketplace.
  2. Install the W&B Weave SDK to start tracking your Bedrock FM calls.
  3. Experiment with models in the W&B Weave Playground.
  4. Set up evaluations for comprehensive model comparisons.
  5. Enhance AgentCore agents with observability features.

Begin with simple integrations and progressively adopt advanced features as your AI applications evolve. The synergy between Amazon Bedrock’s capabilities and W&B Weave provides the foundation for building, evaluating, and maintaining robust AI solutions at scale.

About the Authors

James Yi is a Senior AI/ML Partner Solutions Architect at AWS, collaborating with teams to design advanced joint solutions in the generative AI space.

Ray Strickland is a Senior Partner Solutions Architect at AWS specializing in AI/ML and intelligent document processing, driving innovation and scalable solutions.

Thomas Capelle is a Machine Learning Engineer at W&B, focusing on ML Ops and application developments to enhance enterprise performance.

Scott Juang is the Director of Alliances at W&B, previously leading strategic partnerships in cloud technologies.


This collaborative and innovative landscape of AI development promises to yield exciting solutions that empower organizations to transform their operational capabilities. Dive in and discover the potential awaiting your enterprise!

Latest

Chatbots Perceive Humans as Smarter Than We Truly Are

Understanding Human Behavior Through AI: Insights from the Keynesian...

New Garden Area Established at Royal Shrewsbury Hospital

A New Oasis: Garden Offers Peace and Healing Near...

Enhancing LLM Inference on Amazon SageMaker AI Using BentoML’s LLM Optimizer

Streamlining AI Deployment: Optimizing Large Language Models with Amazon...

What People Are Actually Using ChatGPT For – It Might Surprise You!

The Evolving Role of ChatGPT: From Novelty to Necessity...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Agentic QA Automation with Amazon Bedrock AgentCore Browser and Amazon Nova...

Revolutionizing Quality Assurance: The Future of Agentic AI Testing Introduction to Modern QA Challenges Benefits of Agentic QA Testing AgentCore Browser for Large-Scale Agentic QA Testing Implementing Agentic...

Creating an IDP Solution Programmatically Using Amazon Bedrock Data Automation

Building Intelligent Document Processing Solutions with Amazon Bedrock and Strands SDK Introduction to Intelligent Document Processing Prerequisites for Implementation Solution Architecture Overview Step-by-Step Implementation Guide Configuring the AWS CLI Cloning...

Analyzing the Zero Operator Access Design in Mantle

Elevating Security Standards with Mantle: Amazon's Next-Generation Inference Engine for Generative AI A Commitment to Transparency and Innovation in Customer Data Protection About the Authors Anthony Liguori,...