Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

Enhance Video Accessibility with Automated Audio Descriptions via Amazon Nova

Automating Accessible Audio Descriptions for Visual Content Using AWS AI Services

A Comprehensive Guide to Leveraging Generative AI for Accessibility Compliance


Solution Overview


Services Used


Prerequisites


Solution Walkthrough


Clean Up


Conclusion


About the Authors

Automating Audio Descriptions for Accessibility in Media Using AWS

According to the World Health Organization, over 2.2 billion people have vision impairment globally. This statistic underscores the importance of accessibility in media, especially for visually impaired audiences. In compliance with legislation like the Americans with Disabilities Act (ADA) in the United States, visual formats such as television shows and movies are required to provide accessibility options, often in the form of audio description tracks that narrate visual elements.

However, producing audio descriptions can be costly, averaging $25 per minute when utilizing third-party services. The internal creation of these descriptions involves significant resources, including content creators, audio engineers, and narration talent. This raises the question: Can generative AI solutions, particularly those offered by AWS, automate this process?

AWS Nova Foundation Models: A Game-Changer

At the recent re:Invent 2024, Amazon announced the Amazon Nova Foundation Models, now accessible through Amazon Bedrock, which includes:

  • Amazon Nova Lite: A fast, low-cost model for processing various inputs.
  • Amazon Nova Pro: A versatile model offering an optimal blend of speed, accuracy, and cost for diverse tasks.
  • Amazon Nova Premier: The most advanced model for complex assignments and model distillation.

Automating Audio Descriptions

In this blog post, we discuss how to leverage services like Amazon Nova, Amazon Rekognition, and Amazon Polly to automate the generation of audio descriptions for video content. This method can dramatically decrease the time and cost associated with making videos accessible to visually impaired viewers.

Note: The blog will not provide a complete production-ready solution but will feature pseudocode snippets, guidance, and links to resources to facilitate your development.

Solution Overview

The architecture of the proposed solution allows the integration of various AWS services to complete the audio description workflow efficiently. We recommend running your script on an Amazon SageMaker notebook for optimal performance.

Key AWS Services Used

  1. Amazon S3: For storing video files, text descriptions, and audio outputs.
  2. Amazon Rekognition: To detect and segment video scenes using visual cues.
  3. Amazon Bedrock: To access the Amazon Nova Pro model for analyzing video content and generating detailed descriptions.
  4. Amazon Polly: For converting text descriptions into high-quality audio.

Prerequisites

To implement this solution, ensure you have:

  • AWS SDK set up, with Boto3 integrated.
  • A mechanism for video slicing, such as the moviepy library for Python.

Solution Walkthrough

1. Initializing AWS Environment

Start by defining the necessary AWS configurations, including the Nova Pro model for visual support:

class VideoAnalyzer:
    def initialize(self):
        AWS_REGION = "us-east-1"
        MODEL_ID = "amazon.nova-pro-v1:0"
        chunk_delay = 20
        # Initialize AWS clients (Bedrock and Rekognition)

2. Segmenting Video Content

Use Amazon Rekognition to detect scene boundaries based on various cues (e.g., shot boundaries, black frames):

def get_segment_results(job_id):
    # Implement the function to retrieve segmentation data

3. Analyzing Video Scenes

Utilize the Nova Pro model to analyze each video segment and generate descriptive text.

def analyze_chunk(chunk_path):
    # Logic to convert video chunk into base64 and analyze

4. File Management and Consolidation

Compile all analysis results into a comprehensive text file, which serves as the basis for audio descriptions.

def analyze_video(video_path, bucket):
    # Orchestrate video analysis and save the results

5. Text-to-Speech Conversion

Send the description text to Amazon Polly for voice synthesis, generating an MP3 audio file.

def generate_audio(text_file, output_audio_file):
    # Logic for generating audio from the text analysis

Clean Up

Remember to delete any temporary resources created during the workflow to avoid unnecessary costs.

Conclusion

By employing AWS services like S3, Rekognition, Nova Pro, and Polly, media creators can fully automate the process of generating audio descriptions, significantly reducing time and costs. This not only aids in creating accessible content but also helps businesses comply with accessibility regulations.

Future Considerations

The outlined solution is applicable to various forms of visual media beyond just films and TV shows. With further development and scaling considerations, it can serve as a robust tool for improving accessibility in all forms of visual storytelling.

For more information about the Amazon Nova model family and its capabilities, explore the documentation on Amazon’s official website.

About the Authors

Dylan Martin is an AWS Solutions Architect primarily focused on generative AI, bringing extensive experience from various roles in software engineering and security.

Ankit Patel is a Solutions Developer at AWS, specializing in turning customer ideas into rapid prototype applications using AWS technologies.


This automated audio description approach can help bridge the accessibility gap, ensuring that everyone can enjoy and engage with visual content.

Latest

Advancements in Large Model Inference Container: New Features and Performance Improvements

Enhancing Performance and Reducing Costs in LLM Deployments with...

I asked ChatGPT if the remarkable surge in Lloyds share price has peaked, and here’s what it said…

Assessing the Future of Lloyds Banking: Insights and Reflections Why...

Cows Dominate Robots on Day One: The Tech Revolution Transforming Dairy Farming in Rural Australia

Revolutionizing Dairy Farming: Automated Milking Systems Transform the Lives...

AI Receptionist for Answering Services

Certainly! Here’s a suitable heading for the section you...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Taiwan Semiconductor (TSM) Stock Outlook 2026: In-Depth Analysis

Comprehensive Independent Equity Research Report on TSMC Independent Equity Research Report Understanding the intricacies of equity research is vital for any informed investor. This Independent Equity...

Insights from Real-World COBOL Modernization

Accelerating Mainframe Modernization with AI: Key Insights from AWS Transform Unpacking the Dual Aspects of Modernization The Importance of Comprehensive Context in Mainframe Projects Understanding Platform-Specific Behaviors Ensuring...

Apple Stock 2026 Outlook: Price Target and Investment Thesis for AAPL

Institutional Equity Research Report: Apple Inc. (AAPL) Analysis Report Overview Report Date: February 27, 2026 Analyst: Lead Equity Research Analyst Rating: HOLD 12-Month Price Target: $295 Data Sources All data sourced...