Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

Scalable Multimodal Embeddings: An AI Data Lake for Media and Entertainment Applications

Building a Scalable Multimodal Video Search System with Amazon Nova and OpenSearch

Transforming Video Datasets into Semantic Search Capabilities

This article provides a comprehensive guide on implementing a scalable multimodal video search system that utilizes Amazon Nova models and Amazon OpenSearch Service. Dive into the world of natural language search across extensive video datasets, moving beyond traditional tagging and keyword methods.

Processing Large Video Libraries Efficiently

Learn how we efficiently processed 792,270 videos from two datasets to facilitate advanced search capabilities while managing costs effectively.

Solution Architecture Overview

Understand the architecture of our system, which integrates ingestion and search workflows, and allows for various search methods (text-to-video, video-to-video, and hybrid).

Prerequisites for Implementation

Before starting, ensure you have the necessary AWS account and services configured, including IAM roles and OpenSearch Service.

Step-by-Step Walkthrough

Explore detailed steps for setting up the system—from creating IAM roles to processing videos, generating embeddings, and implementing diverse search functionalities.

Performance Insights

Discover the performance metrics and cost considerations, including query latencies and storage requirements as we scale to handle large datasets.

Conclusion

Find out how this architecture not only meets current needs but also provides a robust foundation for future enhancements and scaling.

About the Authors

Meet the minds behind this solution and their expertise in media, entertainment, and AI technologies.

Building a Scalable Multimodal Video Search System with Amazon Nova and OpenSearch

In today’s digital landscape, the sheer volume of video content is rising exponentially. To harness this rich dataset, organizations need scalable and efficient search systems that can facilitate natural language querying. This blog post demonstrates how to construct a sophisticated multimodal video search system using Amazon’s Nova models and OpenSearch Service. We will guide you through the nuances of moving beyond manual tagging and keyword-based search to leveraging semantic search capabilities that capture the richness of video content.

Processing Large Datasets at Scale

To illustrate this solution, we processed a massive dataset of 792,270 videos sourced from two datasets hosted on the AWS Open Data Registry: Multimedia Commons (787,479 videos with an average duration of 37 seconds) and MEVA (4,791 videos averaging 5 minutes). The total processing time of 8,480 hours (30.5M seconds of video) took just 41 hours and incurred a first-year total cost of $27,328 for on-demand services or $23,632 when using Reserved Instances.

Ingestion Breakdown

  • Amazon EC2 Compute:
    • 4× c7i.48xlarge spot at $2.57/hour × 41 hours = $421
  • Amazon Bedrock Nova Multimodal Embeddings:
    • (30.5M seconds) × $0.00056/second batch pricing = $17,096
  • Nova Pro Tagging:
    • 792K videos × 600 tokens(avg.) = $571

This solution efficiently generates audio-visual embeddings via the AUDIO_VIDEO_COMBINED mode should be the backbone of your indexing strategy.

Solution Overview

The architecture is structured around two fundamental workflows: ingestion and search. This setup is designed to meaningfully enable multimodal video search at scale.

Video Ingestion Pipeline

Utilizing four Amazon EC2 c7i.48xlarge instances, the ingestion pipeline boasts 600 parallel workers, enabling the processing of 19,400 videos/hour. The asynchronous API handles a limit of 30 concurrent jobs per account, necessitating an intelligent job queue system that continually submits jobs and polls for completion.

  1. Upload Videos: Store videos in Amazon S3.
  2. Process Using Nova: The asynchronous API segments video into 15-second chunks to efficiently generate embeddings.
  3. Generate Tags: Use Nova Pro (or Nova Lite) to assign 10-15 tags to each video.
  4. Index the Data: Store embeddings and metadata tags in dual OpenSearch indexes, facilitating efficient retrieval based on diverse search modes.

Types of Searches Enabled

  • Text-to-video Search: Converts natural language queries into embeddings for semantic similarity.
  • Video-to-video Search: Finds similar content through direct comparison of video embeddings.
  • Hybrid Search: Combines vector similarity (weighted 70%) with keyword matching (30%) to enhance accuracy.

Walkthrough

Step 1: Create IAM Roles and Policies

Configure an IAM role that permits invoking Amazon Bedrock models, allowing read/write access to S3 objects and permissions for OpenSearch indexing.

Step 2: Set Up OpenSearch Service Indexes

Create two indexes in OpenSearch Service, one focused on vector embeddings and one for text metadata. This allows for seamless hybrid and semantic search queries.

Step 3: Process Videos with Nova Multimodal Embeddings

Using the Amazon Bedrock async API, process your uploaded videos to generate embeddings. This step entails segmenting the videos into smaller manageable sections to enhance embedding accuracy.

Step 4: Generate Metadata Tags

Generate descriptive tags using Nova Pro or Nova Lite, leveraging a predefined taxonomy for optimal search capabilities.

Step 5: Index Embeddings and Tags in OpenSearch Service

Efficiently store your video embeddings and tags in OpenSearch Service using bulk indexing.

Step 6: Implement Search Functionality

After ingestion, implement low-latency search capabilities with well-defined APIs for natural language queries, video discovery, and hybrid search.

Performance Insights

After indexing all videos, the search performance exhibited remarkable efficiency:

  • Semantic k-NN search: ~76ms
  • BM25 text search: ~30ms
  • Hybrid search: ~106ms

Storage Requirements

  • k-NN index: 28.8 GB
  • Text index: 1.0 GB

This efficient use of storage makes it manageable for modern OpenSearch clusters.

Conclusion

Through this post, we explored a comprehensive solution for building a multimodal video search system capable of handling large datasets and enabling rich search capabilities. By integrating Amazon Nova models with OpenSearch Service, you can leverage semantic search capabilities to unlock the full potential of your video content.


About the Authors

Hammad Ausaf – Principal Solutions Architect in Media and Entertainment, passionate about providing the best solutions to AWS customers.

Rajat Jain – Technical Account Manager in Media and Entertainment, a GenAI/ML enthusiast dedicated to building innovative solutions.

For inquiries or to learn more about the technologies discussed in this walkthrough, explore Amazon Nova Multimodal Embeddings and Hybrid Search with Amazon OpenSearch Service.

Latest

Real-Time Voice Agents Using Stream Vision Agents and Amazon Nova 2 Sonic

Building Production-Grade Real-Time Voice Agents with Stream and Amazon...

Go.Compare Introduces Insurance App Powered by ChatGPT

Go.Compare Launches ChatGPT App for Effortless Insurance Comparison Go.Compare Launches...

Dstl-Backed Robotics Innovation Revolutionizes Military Manufacturing – A Case Study

Revolutionizing Manufacturing: Rivelin Robotics’ Innovations in Precision Finishing for...

Understanding Patient Sentiment in Atopic Dermatitis Management

Insights into Patient Sentiment and Treatment Perceptions in Atopic...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Enhancing Bot Precision with Amazon Lex Assisted NLU

Enhancing Bot Accuracy with Amazon Lex Assisted NLU: A Comprehensive Guide Introduction Improving bot accuracy in Amazon Lex starts with handling how customers communicate naturally. Your...

Walmart Inc. (WMT): AI-Driven Equity Analysis

Comprehensive Financial Analysis Report on Walmart Inc. (WMT) Key Insights on Operational Performance, Valuation, and Future Outlook Disclaimer This report utilizes publicly sourced financial data; it neither...

How Amazon Finance Leverages Generative AI on AWS to Streamline Regulatory...

Transforming Regulatory Inquiry Management with Scalable AI Solutions at Amazon FinTech Overview of Amazon FinTech's Approach to Regulatory Compliance Key Challenges in Handling Regulatory Inquiries Innovative Solutions...