Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

Speed Up Enterprise AI Development with Weights & Biases and Amazon Bedrock AgentCore

Accelerating Generative AI Adoption: Leveraging Amazon Bedrock and W&B Weave for Enterprise Solutions

Co-authored by Thomas Capelle and Ray Strickland from Weights & Biases (W&B)


Discover how organizations can enhance their AI capabilities with advanced tools for development, evaluation, and monitoring.

Accelerating Enterprise AI with W&B and Amazon Bedrock

This post is co-written by Thomas Capelle and Ray Strickland from Weights & Biases (W&B).

Generative artificial intelligence (AI) is rapidly evolving, moving from basic foundational model interactions to advanced workflows that are becoming integral in enterprise operations. As organizations transition from proof-of-concept stages to full-scale deployments, robust tools for the development, evaluation, and monitoring of AI applications are essential.

In this post, we showcase how to leverage Foundation Models (FMs) from Amazon Bedrock along with the recently launched Amazon Bedrock AgentCore, in conjunction with W&B Weave to build, evaluate, and monitor enterprise AI solutions effectively. We will cover the entire development lifecycle, from tracking individual FM calls to overseeing complex agent workflows in production.

Overview of W&B Weave

Weights & Biases (W&B) is an innovative developer system providing essential tools for training models, fine-tuning, and utilizing foundation models suited for enterprises across a plethora of industries.

Key Features of W&B Weave

  • Tracing & Monitoring: Track large language model (LLM) calls and application logic to debug and analyze your production systems efficiently.
  • Systematic Iteration: Refine prompts, datasets, and models seamlessly.
  • Experimentation: Test various models and prompts using the LLM Playground.
  • Evaluation: Utilize custom or pre-built evaluation tools to assess application performance rigorously, and gather real-world user and expert feedback.
  • Guardrails: Safeguard your application with content moderation and prompt safety tools, utilizing both custom and third-party guardrails, including those from Amazon Bedrock.

W&B Weave is available in both multi-tenant and single-tenant environments and can also be deployed directly in a customer’s Amazon Virtual Private Cloud (VPC). Through its integration with the W&B Development Platform, it offers an uninterrupted experience between model training workflows and agentic AI workflows.

To get started, subscribe to the W&B AI Development Platform on AWS Marketplace, which is free for individuals and academic teams.

Tracking Amazon Bedrock FMs with W&B Weave SDK

W&B Weave integrates smoothly with Amazon Bedrock via Python and TypeScript SDKs. With just a few lines of code, you can automatically track LLM calls. Here’s a quick setup:

!pip install weave
import weave
import boto3
import json
from weave.integrations.bedrock.bedrock_sdk import patch_client

weave.init("my_bedrock_app")

client = boto3.client("bedrock-runtime")
patch_client(client)

response = client.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 100,
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ]
    }),
    contentType="application/json",
    accept="application/json"
)

response_dict = json.loads(response.get('body').read())
print(response_dict["content"][0]["text"])

This integration not only automates versioning and tracks configurations but also provides complete visibility into your applications without needing to alter existing logic.

Experimenting with Amazon Bedrock FMs in W&B Weave Playground

The W&B Weave Playground is designed to expedite prompt engineering through an intuitive interface that facilitates testing and comparison of Bedrock models. Notable features include:

  • Directly editing prompts and retrying messages.
  • Side-by-side model comparisons.
  • Quick iterations accessible from trace views.

To dive in, simply add your AWS credentials in the Playground settings, select your Amazon Bedrock FMs, and begin exploring. The platform enables rapid experimentation while ensuring full traceability of your efforts.

Evaluating Amazon Bedrock FMs with W&B Weave Evaluations

Effective evaluation of generative AI models is simplified with W&B Weave Evaluations. By utilizing these tools with Amazon Bedrock, users can evaluate models, analyze outputs, and visualize performance across various metrics. Leveraging built-in scorers, third-party or custom scorers, and expert feedback allows for a better understanding of model trade-offs such as cost, accuracy, and output quality.

Setting Up an Evaluation Job

Here’s an example of how to orchestrate an evaluation job:

import weave
from weave import Evaluation
import asyncio

# Define examples for evaluation
examples = [
    {"question": "What is the capital of France?", "expected": "Paris"},
    {"question": "Who wrote 'To Kill a Mockingbird'?", "expected": "Harper Lee"},
]

@weave.op()
def match_score1(expected: str, output: dict) -> dict:
    return {'match': expected == output['generated_text']}

@weave.op()
def function_to_evaluate(question: str):
    return {'generated_text': 'Paris'}

evaluation = Evaluation(
    dataset=examples, scorers=[match_score1]
)

weave.init('intro-example')
asyncio.run(evaluation.evaluate(function_to_evaluate))

The evaluation dashboard visualizes performance metrics, guiding informed decisions regarding model selection and configurations.

Enhancing Amazon Bedrock AgentCore Observability with W&B Weave

Amazon Bedrock AgentCore provides enterprise-scale operational controls and runtime environments. It integrates seamlessly with popular frameworks to ensure secure and efficient agent operations. Built-in observability through Amazon CloudWatch helps track essential metrics while W&B Weave enhances this capability with detailed execution visualizations.

Integrating W&B Weave Observability

There are two effective ways to add observability to your AgentCore agents:

  1. Native W&B Weave SDK: By using the @weave.op decorator, tracking function calls becomes automatic.
import weave
import os

os.environ["WANDB_API_KEY"] = "your_api_key"
weave.init("your_project_name")

@weave.op()
def word_count_op(text: str) -> int:
    return len(text.split())

@weave.op()
def run_agent(agent: Agent, user_message: str) -> dict:
    result = agent(user_message)
    return {"message": result.message, "model": agent.model.config["model_id"]}
  1. OpenTelemetry Integration: For users with existing OpenTelemetry infrastructures, W&B Weave supports OTLP for seamless integration.
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

exporter = OTLPSpanExporter(...)

with tracer.start_as_current_span("invoke_agent") as span:
    ...

This dual observability approach allows teams to utilize both CloudWatch and W&B Weave, giving them the flexibility to monitor production SLAs, debug complex behaviors, and optimize performance across their workflows.

Conclusion

In this post, we’ve illustrated how to effectively build and optimize enterprise-grade agentic AI solutions by integrating Amazon Bedrock’s FMs and AgentCore with W&B Weave’s observability toolkit. We explored the extensive capabilities of W&B Weave to enhance the LLM development lifecycle from experimentation to evaluation and production monitoring.

Key Takeaways:

  • Automatic Tracking: Easily track Amazon Bedrock FM calls with minimal code adjustments using the W&B Weave SDK.
  • Rapid Experimentation: Engage with the Playground for quick testing and model comparisons.
  • Systematic Evaluation: Compare model performance through the Evaluation framework.
  • Comprehensive Observability: Blend CloudWatch monitoring with W&B Weave’s visualization tools for enhanced operational insights.

To kickstart your journey:

  1. Request a free trial or subscribe to W&B’s AI Development Platform via AWS Marketplace.
  2. Install the W&B Weave SDK to start tracking your Bedrock FM calls.
  3. Experiment with models in the W&B Weave Playground.
  4. Set up evaluations for comprehensive model comparisons.
  5. Enhance AgentCore agents with observability features.

Begin with simple integrations and progressively adopt advanced features as your AI applications evolve. The synergy between Amazon Bedrock’s capabilities and W&B Weave provides the foundation for building, evaluating, and maintaining robust AI solutions at scale.

About the Authors

James Yi is a Senior AI/ML Partner Solutions Architect at AWS, collaborating with teams to design advanced joint solutions in the generative AI space.

Ray Strickland is a Senior Partner Solutions Architect at AWS specializing in AI/ML and intelligent document processing, driving innovation and scalable solutions.

Thomas Capelle is a Machine Learning Engineer at W&B, focusing on ML Ops and application developments to enhance enterprise performance.

Scott Juang is the Director of Alliances at W&B, previously leading strategic partnerships in cloud technologies.


This collaborative and innovative landscape of AI development promises to yield exciting solutions that empower organizations to transform their operational capabilities. Dive in and discover the potential awaiting your enterprise!

Latest

Comprehensive Guide to the Lifecycle of Amazon Bedrock Models

Managing Foundation Model Lifecycle in Amazon Bedrock: Best Practices...

ChatGPT Introduces $100 Coding Subscription Service

OpenAI Introduces New Subscription Tier for Enhanced Coding Features...

EBV Launches MOVE Platform to Enhance Robotics Development

Driving Robotics Forward: Introducing the MOVE Platform by EBV...

Bridging the Realism Gap in User Simulators: A Measurement Approach

Bridging the Realism Gap in Conversational AI: Introducing ConvApparel Enhancing...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Walmart Inc. (WMT) — AI-Driven Equity Analysis

Comprehensive Financial Analysis of Walmart Inc. (WMT) Overview of Analytical Framework Report Purpose: Independent analysis based on publicly sourced financial data. Data Integrity: All numbers are verifiable;...

Fine-Tune Amazon Nova Models Using Amazon Bedrock for Customization

Customizing AI Solutions with Amazon Bedrock and Nova Models: A Comprehensive Guide This heading captures the essence of the content and clearly indicates the focus...

Samsung Electronics (005930.KS): An Analysis of AI Investments

Comprehensive Analysis of Samsung Electronics Co., Ltd.: A Financial Overview and Outlook Executive Summary This report provides an in-depth analysis of Samsung Electronics Co., Ltd., leveraging...