Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Enhance Video Accessibility with Automated Audio Descriptions via Amazon Nova

Automating Accessible Audio Descriptions for Visual Content Using AWS AI Services

A Comprehensive Guide to Leveraging Generative AI for Accessibility Compliance


Solution Overview


Services Used


Prerequisites


Solution Walkthrough


Clean Up


Conclusion


About the Authors

Automating Audio Descriptions for Accessibility in Media Using AWS

According to the World Health Organization, over 2.2 billion people have vision impairment globally. This statistic underscores the importance of accessibility in media, especially for visually impaired audiences. In compliance with legislation like the Americans with Disabilities Act (ADA) in the United States, visual formats such as television shows and movies are required to provide accessibility options, often in the form of audio description tracks that narrate visual elements.

However, producing audio descriptions can be costly, averaging $25 per minute when utilizing third-party services. The internal creation of these descriptions involves significant resources, including content creators, audio engineers, and narration talent. This raises the question: Can generative AI solutions, particularly those offered by AWS, automate this process?

AWS Nova Foundation Models: A Game-Changer

At the recent re:Invent 2024, Amazon announced the Amazon Nova Foundation Models, now accessible through Amazon Bedrock, which includes:

  • Amazon Nova Lite: A fast, low-cost model for processing various inputs.
  • Amazon Nova Pro: A versatile model offering an optimal blend of speed, accuracy, and cost for diverse tasks.
  • Amazon Nova Premier: The most advanced model for complex assignments and model distillation.

Automating Audio Descriptions

In this blog post, we discuss how to leverage services like Amazon Nova, Amazon Rekognition, and Amazon Polly to automate the generation of audio descriptions for video content. This method can dramatically decrease the time and cost associated with making videos accessible to visually impaired viewers.

Note: The blog will not provide a complete production-ready solution but will feature pseudocode snippets, guidance, and links to resources to facilitate your development.

Solution Overview

The architecture of the proposed solution allows the integration of various AWS services to complete the audio description workflow efficiently. We recommend running your script on an Amazon SageMaker notebook for optimal performance.

Key AWS Services Used

  1. Amazon S3: For storing video files, text descriptions, and audio outputs.
  2. Amazon Rekognition: To detect and segment video scenes using visual cues.
  3. Amazon Bedrock: To access the Amazon Nova Pro model for analyzing video content and generating detailed descriptions.
  4. Amazon Polly: For converting text descriptions into high-quality audio.

Prerequisites

To implement this solution, ensure you have:

  • AWS SDK set up, with Boto3 integrated.
  • A mechanism for video slicing, such as the moviepy library for Python.

Solution Walkthrough

1. Initializing AWS Environment

Start by defining the necessary AWS configurations, including the Nova Pro model for visual support:

class VideoAnalyzer:
    def initialize(self):
        AWS_REGION = "us-east-1"
        MODEL_ID = "amazon.nova-pro-v1:0"
        chunk_delay = 20
        # Initialize AWS clients (Bedrock and Rekognition)

2. Segmenting Video Content

Use Amazon Rekognition to detect scene boundaries based on various cues (e.g., shot boundaries, black frames):

def get_segment_results(job_id):
    # Implement the function to retrieve segmentation data

3. Analyzing Video Scenes

Utilize the Nova Pro model to analyze each video segment and generate descriptive text.

def analyze_chunk(chunk_path):
    # Logic to convert video chunk into base64 and analyze

4. File Management and Consolidation

Compile all analysis results into a comprehensive text file, which serves as the basis for audio descriptions.

def analyze_video(video_path, bucket):
    # Orchestrate video analysis and save the results

5. Text-to-Speech Conversion

Send the description text to Amazon Polly for voice synthesis, generating an MP3 audio file.

def generate_audio(text_file, output_audio_file):
    # Logic for generating audio from the text analysis

Clean Up

Remember to delete any temporary resources created during the workflow to avoid unnecessary costs.

Conclusion

By employing AWS services like S3, Rekognition, Nova Pro, and Polly, media creators can fully automate the process of generating audio descriptions, significantly reducing time and costs. This not only aids in creating accessible content but also helps businesses comply with accessibility regulations.

Future Considerations

The outlined solution is applicable to various forms of visual media beyond just films and TV shows. With further development and scaling considerations, it can serve as a robust tool for improving accessibility in all forms of visual storytelling.

For more information about the Amazon Nova model family and its capabilities, explore the documentation on Amazon’s official website.

About the Authors

Dylan Martin is an AWS Solutions Architect primarily focused on generative AI, bringing extensive experience from various roles in software engineering and security.

Ankit Patel is a Solutions Developer at AWS, specializing in turning customer ideas into rapid prototype applications using AWS technologies.


This automated audio description approach can help bridge the accessibility gap, ensuring that everyone can enjoy and engage with visual content.

Latest

AI Whistleblower Claims Robot Can ‘Fracture a Human Skull’ After Being Terminated

Figure AI Faces Legal Action Over Safety Concerns in...

Harnessing AI to Decode Brand Sentiment

Unlocking Customer Insights: The Power of AI Brand Sentiment...

Harnessing Generative AI in QA: Strategies for Effective Testing Without Accumulating Technical Debt

The Evolving Landscape of Software Quality: Generative AI's Impact...

Run IBM’s AI Chatbot Locally in Your Web Browser

IBM Unveils Granite 4.0 Nano AI Models: Localized Chatbots...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Microsoft launches new AI tool to assist finance teams with generative tasks

Microsoft Launches AI Copilot for Finance Teams in Microsoft...

Accelerating PLC Code Generation with Wipro PARI and Amazon Bedrock

Streamlining PLC Code Generation: The Wipro PARI and Amazon Bedrock Collaboration Revolutionizing Industrial Automation Code Development with AI Insights Unleashing the Power of Automation: A New...

Optimize AI Operations with the Multi-Provider Generative AI Gateway Architecture

Streamlining AI Management with the Multi-Provider Generative AI Gateway on AWS Introduction to the Generative AI Gateway Addressing the Challenge of Multi-Provider AI Infrastructure Reference Architecture for...

MSD Investigates How Generative AI and AWS Services Can Enhance Deviation...

Transforming Deviation Management in Biopharmaceuticals: Harnessing Generative AI and Emerging Technologies at MSD Transforming Deviation Management in Biopharmaceutical Manufacturing with Generative AI Co-written by Hossein Salami...