Automating Accessible Audio Descriptions for Visual Content Using AWS AI Services

A Comprehensive Guide to Leveraging Generative AI for Accessibility Compliance

Solution Overview

Services Used

Prerequisites

Solution Walkthrough

Clean Up

Conclusion

About the Authors

Automating Audio Descriptions for Accessibility in Media Using AWS

According to the World Health Organization, over 2.2 billion people have vision impairment globally. This statistic underscores the importance of accessibility in media, especially for visually impaired audiences. In compliance with legislation like the Americans with Disabilities Act (ADA) in the United States, visual formats such as television shows and movies are required to provide accessibility options, often in the form of audio description tracks that narrate visual elements.

However, producing audio descriptions can be costly, averaging $25 per minute when utilizing third-party services. The internal creation of these descriptions involves significant resources, including content creators, audio engineers, and narration talent. This raises the question: Can generative AI solutions, particularly those offered by AWS, automate this process?

AWS Nova Foundation Models: A Game-Changer

At the recent re:Invent 2024, Amazon announced the Amazon Nova Foundation Models, now accessible through Amazon Bedrock, which includes:

Amazon Nova Lite: A fast, low-cost model for processing various inputs.
Amazon Nova Pro: A versatile model offering an optimal blend of speed, accuracy, and cost for diverse tasks.
Amazon Nova Premier: The most advanced model for complex assignments and model distillation.

Automating Audio Descriptions

In this blog post, we discuss how to leverage services like Amazon Nova, Amazon Rekognition, and Amazon Polly to automate the generation of audio descriptions for video content. This method can dramatically decrease the time and cost associated with making videos accessible to visually impaired viewers.

Note: The blog will not provide a complete production-ready solution but will feature pseudocode snippets, guidance, and links to resources to facilitate your development.

Solution Overview

The architecture of the proposed solution allows the integration of various AWS services to complete the audio description workflow efficiently. We recommend running your script on an Amazon SageMaker notebook for optimal performance.

Key AWS Services Used

Amazon S3: For storing video files, text descriptions, and audio outputs.
Amazon Rekognition: To detect and segment video scenes using visual cues.
Amazon Bedrock: To access the Amazon Nova Pro model for analyzing video content and generating detailed descriptions.
Amazon Polly: For converting text descriptions into high-quality audio.

Prerequisites

To implement this solution, ensure you have:

AWS SDK set up, with Boto3 integrated.
A mechanism for video slicing, such as the moviepy library for Python.

Solution Walkthrough

1. Initializing AWS Environment

Start by defining the necessary AWS configurations, including the Nova Pro model for visual support:

class VideoAnalyzer:
    def initialize(self):
        AWS_REGION = "us-east-1"
        MODEL_ID = "amazon.nova-pro-v1:0"
        chunk_delay = 20
        # Initialize AWS clients (Bedrock and Rekognition)

2. Segmenting Video Content

Use Amazon Rekognition to detect scene boundaries based on various cues (e.g., shot boundaries, black frames):

def get_segment_results(job_id):
    # Implement the function to retrieve segmentation data

3. Analyzing Video Scenes

Utilize the Nova Pro model to analyze each video segment and generate descriptive text.

def analyze_chunk(chunk_path):
    # Logic to convert video chunk into base64 and analyze

4. File Management and Consolidation

Compile all analysis results into a comprehensive text file, which serves as the basis for audio descriptions.

def analyze_video(video_path, bucket):
    # Orchestrate video analysis and save the results

5. Text-to-Speech Conversion

Send the description text to Amazon Polly for voice synthesis, generating an MP3 audio file.

def generate_audio(text_file, output_audio_file):
    # Logic for generating audio from the text analysis

Clean Up

Remember to delete any temporary resources created during the workflow to avoid unnecessary costs.

Conclusion

By employing AWS services like S3, Rekognition, Nova Pro, and Polly, media creators can fully automate the process of generating audio descriptions, significantly reducing time and costs. This not only aids in creating accessible content but also helps businesses comply with accessibility regulations.

Future Considerations

The outlined solution is applicable to various forms of visual media beyond just films and TV shows. With further development and scaling considerations, it can serve as a robust tool for improving accessibility in all forms of visual storytelling.

For more information about the Amazon Nova model family and its capabilities, explore the documentation on Amazon’s official website.

About the Authors

Dylan Martin is an AWS Solutions Architect primarily focused on generative AI, bringing extensive experience from various roles in software engineering and security.

Ankit Patel is a Solutions Developer at AWS, specializing in turning customer ideas into rapid prototype applications using AWS technologies.

This automated audio description approach can help bridge the accessibility gap, ensuring that everyone can enjoy and engage with visual content.

Exclusive Content:

Enhance Video Accessibility with Automated Audio Descriptions via Amazon Nova

Automating Accessible Audio Descriptions for Visual Content Using AWS AI Services

A Comprehensive Guide to Leveraging Generative AI for Accessibility Compliance

Solution Overview

Services Used

Prerequisites

Solution Walkthrough

Clean Up

Conclusion

About the Authors

Automating Audio Descriptions for Accessibility in Media Using AWS

AWS Nova Foundation Models: A Game-Changer

Automating Audio Descriptions

Solution Overview

Key AWS Services Used

Prerequisites

Solution Walkthrough

1. Initializing AWS Environment

2. Segmenting Video Content

3. Analyzing Video Scenes

4. File Management and Consolidation

5. Text-to-Speech Conversion

Clean Up

Conclusion

Future Considerations

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe