Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Amazon Bedrock AgentCore Runtime Now Supports Bi-Directional Streaming for Real-Time Agent Interactions

Enhancing AI Conversations: The Power of Bi-Directional Streaming in Amazon Bedrock AgentCore Runtime


This heading captures the essence of the content, highlighting the focus on bi-directional streaming and its impact on AI conversations.

Building Natural Voice Conversations with AI Agents

In an era where conversational AI is increasingly integrated into our daily lives, creating natural and engaging voice interactions with AI agents poses a significant challenge. This process typically involves complex infrastructure and extensive coding efforts from engineering teams. Traditional text-based interactions follow a turn-based pattern: users submit a complete request, wait for processing, and receive a full response before continuing. However, bi-directional streaming revolutionizes this by establishing a persistent connection that facilitates continuous data flow in both directions.

What is Bi-Di Streaming and Why Does It Matter?

The Amazon Bedrock AgentCore Runtime introduces support for bi-directional streaming, enabling real-time, two-way communication between users and AI agents. This capability allows agents to simultaneously listen to user input while generating responses, resulting in a more fluid and natural conversational experience. Bi-directional streaming is particularly effective for multimodal interactions, such as voice and vision conversations. With this functionality, AI agents can process incoming audio and generate responses concurrently, handle interruptions, and adapt responses based on immediate feedback—akin to human dialogue dynamics.

The Impact of Bi-Directional Voice Chat Agents

Imagine having a conversation with an AI agent that smoothly mimics human-like dialogue, allowing you to interrupt or redirect the topic without hesitation. This fluidity requires the agent to maintain conversational context while managing streaming audio input and output simultaneously. Developing such infrastructure from scratch can demand significant engineering expertise and time.

Amazon Bedrock’s AgentCore Runtime simplifies this challenge by providing a secure, serverless environment for deploying AI agents without the headache of creating and maintaining complex streaming infrastructures.


Understanding AgentCore Runtime Bi-Directional Streaming

The WebSocket Protocol

At the heart of bi-directional streaming in AgentCore Runtime is the WebSocket protocol, which allows full-duplex communication over a single TCP connection. This setup creates a continuous channel for data to flow seamlessly in both directions.

Once a connection is established, the agent can receive user input as it streams, while simultaneously sending response chunks back to the user. The AgentCore Runtime effectively manages the underlying infrastructure—connection handling, message ordering, and maintaining conversational state—removing the burden from developers who would otherwise need to build custom streaming systems.

Enhancing Conversational Dynamics

Interacting with voice agents is inherently different from text-based conversations. Users expect the natural flow characteristic of human dialogue, including the ability to interject for corrections or clarifications. With bi-directional streaming, voice agents can process incoming audio, generate responses, and adjust their behavior in real-time, thereby preserving the thread of conversation even when topics shift.


Exploring WebSocket Implementation

To create an effective WebSocket implementation in AgentCore Runtime, developers need to adhere to a few key patterns.

  1. WebSocket Endpoints: Contain the WebSocket implementation on port 8080 at the /ws path.
  2. Health Checks: Integrate a /ping endpoint for regular health checks.
  3. Client Connection: Utilize a WebSocket language library to establish a connection, such as:
    wss://bedrock-agentcore..amazonaws.com/runtimes//ws
  4. Authentication: Ensure use of supported authentication methods, such as SigV4 headers or OAuth 2.0.

Simplifying Voice Agent Development with Strands

One standout feature is the Amazon Nova Sonic model, which integrates speech understanding and generation into a single model, delivering remarkably human-like conversational AI. The newly introduced bi-directional streaming in AgentCore Runtime enables developers to effortlessly host voice agents using two approaches:

  1. Direct Implementation: Managing WebSocket connections and orchestrating asynchronous tasks.
  2. Strands Bi-Directional Agent Implementation: This abstracts complexity and streamlines various processes, making bi-directional streaming accessible even to those without specialized real-time expertise.

Example Implementation

Consider this simple implementation using the Strands framework for real-time audio conversations:

from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel
from strands_tools import calculator

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket, model_name: str):
    model = BidiNovaSonicModel(
        region="us-east-1",
        model_id="amazon.nova-sonic-v1:0",
        provider_config={
            "audio": {
                "input_sample_rate": 16000,
                "output_sample_rate": 24000,
                "voice": "matthew",
            }
        }
    )
    agent = BidiAgent(
        model=model,
        tools=[calculator],
        system_prompt="You are a helpful assistant with access to a calculator tool.",
    )
    await agent.run(inputs=[receive_and_convert], outputs=[websocket.send_json])

This code illustrates how Strands simplifies agent development, allowing developers to focus on key business logic instead of the underlying complexities of protocol events and WebSocket management.


Conclusion

The integration of bi-directional streaming within Amazon Bedrock’s AgentCore Runtime transforms the landscape of conversational AI development. By leveraging a WebSocket-based real-time communication infrastructure, developers can bypass the months of effort typically required to implement streaming systems from scratch. The flexibility to create varying types of voice agents—ranging from native implementations with Amazon Nova Sonic to high-level frameworks such as Strands—opens new avenues for deploying AI.

This advancement makes it easier for developers across various backgrounds to bring engaging voice experiences to life, reinforcing the capabilities of conversational AI in our daily interactions.


About the Authors

Lana Zhang is a Senior Specialist Solutions Architect for Generative AI at AWS, focusing on AI voice assistants and multimodal understanding.

Phelipe Fabres is a Senior Specialist Solutions Architect for Generative AI at AWS for Startups, specializing in Agentic systems.

Evandro Franco is a Senior Data Scientist at AWS, working on AI/ML solutions across various sectors.

Latest

Accountants Warn: ChatGPT Tax Guidance Already Hitting UK Businesses Hard

Growing Risks: Businesses Face Financial Losses from Misuse of...

SenseTime’s ACE Robotics Introduces Three Key Technologies to Speed Up Embodied AI Implementation

ACE Robotics Unveils Groundbreaking Innovations in Embodied AI Technology Major...

College Students Use ChatGPT for Exams as Universities Rush to Create Guidelines

Rising Concerns: Academic Dishonesty Linked to Generative AI in...

Gen AI Isn’t Stifling Creativity; It’s Making ‘Just Good Enough’ the Norm

The Rise of the "Just Good Enough" Economy: Implications...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Celebrating a Year of Excellence in Education and Practical Impact –...

Reflecting on 2025: Purposeful Impact and Growth at BigML Turning Machine Learning into Real-World Value for Businesses Empowering Quality Machine Learning Education Through Practice One Platform, Two...

How Tata Power CoE Developed a Scalable AI-Driven Solar Panel Inspection...

Revolutionizing Solar Panel Inspections: Harnessing AI for Efficiency and Accuracy in India’s Solar Energy Future This heading effectively reflects the main themes of the content,...

Dynamic Infrastructure for Training Foundation Models with Elastic Training on SageMaker...

Maximizing AI Infrastructure Efficiency with Amazon SageMaker HyperPod's Elastic Training Introduction to Elastic Training The Challenge of Static Resource Allocation A Dynamic Solution: Elastic Training Overview How Elastic...