Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

Transform Your Web Apps into Hands-Free Experiences with Amazon Nova Sonic

Revolutionizing User Interaction: Embracing Voice in Application Design with Amazon Nova Sonic


Embracing the Future: Introducing Voice as the Primary Interface with Amazon Nova Sonic

Graphical user interfaces (GUIs) have long been the backbone of user interaction with applications. However, the shift in user expectation is clear: today’s users want to interact with their applications through conversation. Enter Amazon Nova Sonic, a cutting-edge foundation model from Amazon Bedrock that facilitates this transition by enabling natural, low-latency, bidirectional speech conversations via a simple streaming API. This evolution allows users to collaborate with applications, leveraging voice and embedded intelligence to streamline tasks rather than merely operating them.

In this post, we will explore how we integrated a voice-first experience into our reference application, the Smart Todo App, transforming mundane task management into a seamless, hands-free conversation.

Rethinking User Interaction Through Collaborative AI Voice Agents

Usability enhancements are frequently sidelined—not for lack of value, but due to the complexities of integrating them into traditional mouse-and-keyboard interfaces. Features like intelligent batch actions or voice-guided assistance often get pushed to the side in favor of simpler designs. But voice doesn’t have to replace traditional UI; it offers a new, general-purpose interaction mode that enhances accessibility and user experience.

Amazon Nova Sonic raises the bar beyond simple voice commands. This model is capable of planning multistep workflows, invoking backend tools, and maintaining context across multiple interactions, enabling genuine collaboration between the application and the user.

Example Voice Interactions Across Various Domains

The following table illustrates how voice interactions can be applied across different application domains, including task management, CRM, and help desk operations:

Voice Interaction (Example Phrase) Intent / Goal System Action / Behavior Confirmation / UX
Mark all my tasks as complete. Bulk-complete tasks Find user’s open tasks → mark complete → archive if configured All 12 open tasks are marked complete.
Create a plan for preparing the Q3 budget: break it into steps… Create multistep workflow Generate plan → create tasks → assign owners… Plan created with 6 tasks. Notify owners?
Find enterprise leads in APAC… and draft personalized outreach. Build targeted prospect list Query CRM → assemble filtered list → draft personalized messages Drafted 24 personalized outreach messages. Review and send?
Prioritize all P1 tickets… and assign them to on-call Triage and assign Filter tickets → set priority → assign to on-call → log changes 12 P1 tickets prioritized and assigned.

With Amazon Nova Sonic, the assistant comprehends intent, triggers necessary APIs, and confirms results—all without any forms. This approach fosters an environment where productivity is maximized, and the user context becomes the interface itself. This isn’t about eliminating traditional UI but about empowering users with new capabilities through voice.

The Sample Application at a Glance

In the Smart Todo reference application, users can manage to-do lists and corresponding notes. With the added vocal capability, the app transforms into a hands-free tool that enables intuitive interactions. Users can simply say:

  • “Add a note to follow up on the project charter.”
  • “Archive all completed tasks.”

Each voice command is executed seamlessly in a way that feels both natural and efficient.

How Amazon Nova Sonic’s Bidirectional APIs Work

Amazon Nova Sonic employs a real-time, bidirectional streaming architecture. Here’s a brief overview of how it operates:

  1. Session Start – The client initiates a session with model configuration.
  2. Prompt and Content Start – Structured events indicate whether upcoming data is audio, text, or tool input.
  3. Audio Streaming – Users’ microphone audio is streamed as base64-encoded audio input events.
  4. Model Responses – As the model processes input, it streams a range of asynchronous responses, including:
    • Automatic speech recognition (ASR) results
    • Tool use invocations
    • Text responses
    • Audio output for playback
  5. Session Close – The session is explicitly terminated by sending closure events.

This event-driven architecture facilitates intuitive interactions, enabling features like “barge-in” capabilities for interrupting the assistant and multi-turn conversations.

Solution Architecture

Our solution leverages a serverless application architecture pattern, where a React single-page application (SPA) interfaces with backend web APIs hosted on server-side containers. The Smart Todo App employs a scalable and secure AWS architecture, optimized for real-time voice interactions.

Key AWS services involved include:

  • Amazon Bedrock: Powers the real-time speech interactions.
  • Amazon CloudFront: A CDN that ensures rapid global content delivery.
  • AWS Fargate: Runs containerized services for WebSocket handling and REST APIs.
  • Application Load Balancer (ALB): Distributes web traffic for efficient backend service management.
  • Amazon VPC: Provides network isolation and security for backend services.
  • Amazon S3: Hosts the React frontend for user interactions.
  • Amazon DynamoDB: Stores application data such as to-do lists and notes.

Deploying the Solution

To explore the capabilities of this solution, we’ve made the sample code for the Smart Todo App available on GitHub. The application comprises multiple independent Node.js projects, including frontend and backend components.

Prerequisites and Deployment Steps

  1. Clone the Repository

    git clone https://github.com/aws-samples/sample-amazon-q-developer-vibe-coded-projects.git
    cd NovaSonicVoiceAssistant
  2. Deploy for the First Time

    npm run deploy:first-time

This script automates the process of installing dependencies, building the components, and deploying the infrastructure.

Verifying Deployment

Once deployed, you can access the provided Amazon CloudFront URL to test the voice functionality and ensure full integration with Amazon Nova Sonic.

Clean Up

To remove the stacks after testing, simply run:

cd infra
npm run destroy

Next Steps

Voice is not merely an auxiliary add-on; it’s becoming the primary interface for complex workflows. We encourage you to delve into the resources below to get started with this exciting integration:

  • Sample Code Repo: A working integration for hands-on exploration.
  • Hands-On Workshop: A guided lab to deploy Amazon Nova Sonic in your AWS account.
  • Documentation: Comprehensive API reference and design best practices.

About the Authors

  • Manu Mishra: Senior Solutions Architect at AWS, specializing in AI and security strategies.
  • AK Soni: Senior Technical Account Manager at AWS, focusing on cloud and AI/ML solutions.
  • Raj Bagwe: Senior Solutions Architect at AWS, with a passion for assisting customers in navigating technological challenges.

With Amazon Nova Sonic, the future of interaction is here. Talking to your application might just be the fastest way to get things done!

Latest

Identify and Redact Personally Identifiable Information with Amazon Bedrock Data Automation and Guardrails

Automated PII Detection and Redaction Solution with Amazon Bedrock Overview In...

OpenAI Introduces ChatGPT Health for Analyzing Medical Records in the U.S.

OpenAI Launches ChatGPT Health: A New Era in Personalized...

Making Vision in Robotics Mainstream

The Evolution and Impact of Vision Technology in Robotics:...

Revitalizing Rural Education for China’s Aging Communities

Transforming Vacant Rural Schools into Age-Friendly Facilities: Addressing Demographic...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Identify and Redact Personally Identifiable Information with Amazon Bedrock Data Automation...

Automated PII Detection and Redaction Solution with Amazon Bedrock Overview In an era where organizations handle vast amounts of sensitive customer information, maintaining data privacy and...

Understanding the Dummy Variable Trap in Machine Learning Made Simple

Understanding Dummy Variables and Avoiding the Dummy Variable Trap in Machine Learning What Are Dummy Variables and Why Are They Important? What Is the Dummy Variable...

30 Must-Read Data Science Books for 2026

The Essential Guide to Data Science: 30 Must-Read Books for 2026 Explore a curated list of essential books that lay a strong foundation in data...