Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

Scalable Multimodal Embeddings: An AI Data Lake for Media and Entertainment Applications

Building a Scalable Multimodal Video Search System with Amazon Nova and OpenSearch

Transforming Video Datasets into Semantic Search Capabilities

This article provides a comprehensive guide on implementing a scalable multimodal video search system that utilizes Amazon Nova models and Amazon OpenSearch Service. Dive into the world of natural language search across extensive video datasets, moving beyond traditional tagging and keyword methods.

Processing Large Video Libraries Efficiently

Learn how we efficiently processed 792,270 videos from two datasets to facilitate advanced search capabilities while managing costs effectively.

Solution Architecture Overview

Understand the architecture of our system, which integrates ingestion and search workflows, and allows for various search methods (text-to-video, video-to-video, and hybrid).

Prerequisites for Implementation

Before starting, ensure you have the necessary AWS account and services configured, including IAM roles and OpenSearch Service.

Step-by-Step Walkthrough

Explore detailed steps for setting up the system—from creating IAM roles to processing videos, generating embeddings, and implementing diverse search functionalities.

Performance Insights

Discover the performance metrics and cost considerations, including query latencies and storage requirements as we scale to handle large datasets.

Conclusion

Find out how this architecture not only meets current needs but also provides a robust foundation for future enhancements and scaling.

About the Authors

Meet the minds behind this solution and their expertise in media, entertainment, and AI technologies.

Building a Scalable Multimodal Video Search System with Amazon Nova and OpenSearch

In today’s digital landscape, the sheer volume of video content is rising exponentially. To harness this rich dataset, organizations need scalable and efficient search systems that can facilitate natural language querying. This blog post demonstrates how to construct a sophisticated multimodal video search system using Amazon’s Nova models and OpenSearch Service. We will guide you through the nuances of moving beyond manual tagging and keyword-based search to leveraging semantic search capabilities that capture the richness of video content.

Processing Large Datasets at Scale

To illustrate this solution, we processed a massive dataset of 792,270 videos sourced from two datasets hosted on the AWS Open Data Registry: Multimedia Commons (787,479 videos with an average duration of 37 seconds) and MEVA (4,791 videos averaging 5 minutes). The total processing time of 8,480 hours (30.5M seconds of video) took just 41 hours and incurred a first-year total cost of $27,328 for on-demand services or $23,632 when using Reserved Instances.

Ingestion Breakdown

  • Amazon EC2 Compute:
    • 4× c7i.48xlarge spot at $2.57/hour × 41 hours = $421
  • Amazon Bedrock Nova Multimodal Embeddings:
    • (30.5M seconds) × $0.00056/second batch pricing = $17,096
  • Nova Pro Tagging:
    • 792K videos × 600 tokens(avg.) = $571

This solution efficiently generates audio-visual embeddings via the AUDIO_VIDEO_COMBINED mode should be the backbone of your indexing strategy.

Solution Overview

The architecture is structured around two fundamental workflows: ingestion and search. This setup is designed to meaningfully enable multimodal video search at scale.

Video Ingestion Pipeline

Utilizing four Amazon EC2 c7i.48xlarge instances, the ingestion pipeline boasts 600 parallel workers, enabling the processing of 19,400 videos/hour. The asynchronous API handles a limit of 30 concurrent jobs per account, necessitating an intelligent job queue system that continually submits jobs and polls for completion.

  1. Upload Videos: Store videos in Amazon S3.
  2. Process Using Nova: The asynchronous API segments video into 15-second chunks to efficiently generate embeddings.
  3. Generate Tags: Use Nova Pro (or Nova Lite) to assign 10-15 tags to each video.
  4. Index the Data: Store embeddings and metadata tags in dual OpenSearch indexes, facilitating efficient retrieval based on diverse search modes.

Types of Searches Enabled

  • Text-to-video Search: Converts natural language queries into embeddings for semantic similarity.
  • Video-to-video Search: Finds similar content through direct comparison of video embeddings.
  • Hybrid Search: Combines vector similarity (weighted 70%) with keyword matching (30%) to enhance accuracy.

Walkthrough

Step 1: Create IAM Roles and Policies

Configure an IAM role that permits invoking Amazon Bedrock models, allowing read/write access to S3 objects and permissions for OpenSearch indexing.

Step 2: Set Up OpenSearch Service Indexes

Create two indexes in OpenSearch Service, one focused on vector embeddings and one for text metadata. This allows for seamless hybrid and semantic search queries.

Step 3: Process Videos with Nova Multimodal Embeddings

Using the Amazon Bedrock async API, process your uploaded videos to generate embeddings. This step entails segmenting the videos into smaller manageable sections to enhance embedding accuracy.

Step 4: Generate Metadata Tags

Generate descriptive tags using Nova Pro or Nova Lite, leveraging a predefined taxonomy for optimal search capabilities.

Step 5: Index Embeddings and Tags in OpenSearch Service

Efficiently store your video embeddings and tags in OpenSearch Service using bulk indexing.

Step 6: Implement Search Functionality

After ingestion, implement low-latency search capabilities with well-defined APIs for natural language queries, video discovery, and hybrid search.

Performance Insights

After indexing all videos, the search performance exhibited remarkable efficiency:

  • Semantic k-NN search: ~76ms
  • BM25 text search: ~30ms
  • Hybrid search: ~106ms

Storage Requirements

  • k-NN index: 28.8 GB
  • Text index: 1.0 GB

This efficient use of storage makes it manageable for modern OpenSearch clusters.

Conclusion

Through this post, we explored a comprehensive solution for building a multimodal video search system capable of handling large datasets and enabling rich search capabilities. By integrating Amazon Nova models with OpenSearch Service, you can leverage semantic search capabilities to unlock the full potential of your video content.


About the Authors

Hammad Ausaf – Principal Solutions Architect in Media and Entertainment, passionate about providing the best solutions to AWS customers.

Rajat Jain – Technical Account Manager in Media and Entertainment, a GenAI/ML enthusiast dedicated to building innovative solutions.

For inquiries or to learn more about the technologies discussed in this walkthrough, explore Amazon Nova Multimodal Embeddings and Hybrid Search with Amazon OpenSearch Service.

Latest

Milestone Systems Launches AI Video Analytics with Generative AI Capabilities

Milestone Systems Unveils Next-Generation Video Management Solutions with XProtect...

New Study Highlights Risks of AI Chatbots in Promoting Delusional Thinking

Concerns About AI Chatbots and Their Potential to Exacerbate...

Enhancing Security for AI Agents Using Policy in Amazon Bedrock AgentCore

Ensuring Safe AI Agent Deployment in Regulated Industries Understanding the...

A Single Word That Could Transform Your Perspective on ChatGPT

The Perils of Anthropomorphizing AI: Why We Must Remember...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

Alphabet (GOOG) Stock Analysis for 2026: BUY with a Target of...

Investment Research Report: Alphabet Inc. (GOOG) Rating: BUY | 12-Month Price Target: $390 | Current Price: ~$306 | Implied Upside: ~27% Report Date: February 27, 2026...

P-EAGLE: Accelerating LLM Inference via Parallel Speculative Decoding in vLLM

Unlocking Accelerated Performance in LLM Inference with P-EAGLE: A Next-Gen Parallel Drafting Solution Introduction to P-EAGLE and Performance Enhancements Quick Start Guide to Enable Parallel Drafting Understanding...

Amazon Stock Outlook 2026: Valuation Insights, AWS Performance, and Capital Expenditure...

Here are some suggested headings for your analysis, designed to capture the essence of the content effectively: 1. Understanding Our Analysis: Data Integrity and Independence 2....