Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

Introducing Multimodal Retrieval for Knowledge Bases in Amazon Bedrock

Exciting Announcement: Multimodal Retrieval Now Available for Amazon Bedrock Knowledge Bases

Unlocking New Possibilities with Native Support for Video and Audio Content

Streamlining AI Applications Across Multiple Content Types

Enhancing Search Capabilities: A Shift from Text-First Approaches

Introducing Unified Workflows for Text, Image, Audio, and Video Retrieval


Understanding Multimodal Knowledge Bases

How Amazon Bedrock Automates the Retrieval Workflow

Exploring Amazon Nova Multimodal Embeddings

Utilizing Bedrock Data Automation for Rich Text Descriptions


Use Case Scenario: Visual Product Search for E-Commerce


Console Walkthrough: Setting Up a Multimodal Knowledge Base

Prerequisites for Getting Started

Configuring Your Data Source and Knowledge Base

Storing and Processing Data for Optimal Retrieval

Testing Your Knowledge Base with Various Input Types

Utilizing Bedrock Data Automation for Enhanced Context

Cleanup Procedures after Testing


Conclusion: Simplifying RAG Applications Across Media Types

Availability of Features by Region

Next Steps: Explore, Experiment, Learn, and Grow


About the Authors

Unleashing the Power of Multimodal Retrieval with Amazon Bedrock Knowledge Bases

We are thrilled to announce the general availability of multimodal retrieval for Amazon Bedrock Knowledge Bases. This cutting-edge capability now adds native support for video and audio content in addition to text and images. With this update, you can create Retrieval Augmented Generation (RAG) applications that seamlessly search and retrieve information across various media types—all within a fully managed service.

Why Multimodal Retrieval Matters

Modern enterprises store critical information in multiple formats, from product documentation featuring diagrams and screenshots to training materials that include instructional videos and customer insights captured in recorded meetings. Historically, building AI applications capable of efficiently searching across these content types required intricate custom infrastructure and considerable engineering effort.

Previously, Bedrock Knowledge Bases relied on text-based embedding models for retrieval. While this approach supported text documents and images, images had to be converted into text descriptions, which often resulted in loss of visual context and limited visual search capabilities.

Enter Multimodal Embeddings

With this new feature, Bedrock Knowledge Bases now supports native embeddings, allowing for effective search across text, images, audio, and video—all contained within a single embedding model. This approach preserves the context of various media types, enabling users to query with images to find visually similar content or locate specific scenes in videos.

Overview of Multimodal Knowledge Bases

Amazon Bedrock Knowledge Bases automates the entire RAG workflow: it ingests content from data sources, parses, chunks, and converts it into searchable segments, and stores these in a vector database. When a user query is submitted, it’s embedded and matched against stored vectors to find semantically similar content, enriching the prompt sent to your foundation model.

Building Multimodal RAG Applications

In our upcoming series, we will walk you through the process of creating multimodal RAG applications. This involves understanding how multimodal knowledge bases function, selecting the appropriate processing strategy based on content type, and configuring multimodal retrieval using both the AWS Console and code examples.

Amazon Nova Multimodal Embeddings

The Amazon Nova Multimodal Embeddings model is the first of its kind, unifying text, documents, images, video, and audio into a shared vector space without requiring text conversion. This cutting-edge model supports up to 8,172 tokens for text and 30 seconds of video/audio segments across 200+ languages.

The model captures both visual and audio characteristics, allowing for multifaceted retrieval capabilities—enabling you to search the visual elements of a scene and retrieve matching videos.

Bedrock Data Automation

Alternatively, Bedrock Data Automation converts multimedia content into rich textual representations before embedding. This text-first strategy is particularly beneficial in compliance scenarios where exact quotes and verbatim records are essential. It allows for highly accurate retrieval over spoken content, making it invaluable for applications such as customer support calls, training videos, and webinars.

Practical Applications: Visual Product Search

Consider an e-commerce scenario where traditional text searches fall short. Customers often struggle to articulate their desires with the right keywords. Now, with multimodal retrieval, they can simply upload an image or reference a video scene, allowing the system to retrieve visually similar items through embedded representations.

This change transforms the shopping experience, making product discovery not only accessible but intuitive.

A Quick Guide to Setup

1. Prerequisites

Ensure you have the necessary prerequisites for setting up.

2. Create a Knowledge Base

Open the Amazon Bedrock console and create a new knowledge base. You’ll need to provide details regarding the data source type.

3. Data Configuration

Connect your S3 bucket containing product images and videos. Choose the appropriate parsing strategy and select Amazon Nova Multimodal Embeddings for unified processing.

4. Storage and Processing

Designate Amazon S3 Vectors as the vector store for optimized query performance.

5. Ingestion and Testing

After creating the knowledge base, start the ingestion process. Test your knowledge base using both text and image queries to ensure that the retrieval works seamlessly across different types.

Conclusion

The introduction of multimodal retrieval for Amazon Bedrock Knowledge Bases simplifies the development of RAG applications that transcend the limitations of text-based searches. With integrated support for video and audio, businesses can now leverage a comprehensive knowledge base that brings insights from diverse formats to the forefront.

For a deeper dive, check out the Amazon Bedrock Knowledge Bases documentation and explore hands-on code examples in the Amazon Bedrock samples repository to unlock the full potential of this powerful tool.

Next Steps

  • Explore the Documentation: Familiarize yourself with the intricacies of Amazon Bedrock Knowledge Bases.
  • Experiment with Code Examples: Access informative notebooks for practical insights.
  • Learn More About Nova: Gain further technical details on Amazon Nova Multimodal Embeddings.

About the Authors

Dani Mitchell is a Generative AI Specialist Solutions Architect at Amazon Web Services (AWS), dedicated to accelerating enterprises on their generative AI journeys.
Pallavi Nargund is a Principal Solutions Architect at AWS, leading the generative AI initiative for US Greenfield.
Jean-Pierre Dodel serves as a Principal Product Manager focusing on innovations for multimodal RAG and has extensive experience in Enterprise Search and AI/ML.

Embrace the future of information retrieval with Amazon Bedrock Knowledge Bases and unlock new opportunities within your enterprise!

Latest

50+ Essential Machine Learning Resources for Self-Study in 2026

Unlocking the World of Machine Learning: Essential Resources for...

ChatGPT’s 4% Fee Validates Marketplace Economics

Shopify Merchants to Face 4% Transaction Fee on ChatGPT...

AFF Holiday & Travel Expo, Robotics Conference, and E-Commerce Summit

Upcoming Major Events in Hong Kong: Financial Insights, Travel...

Wealth and Asset Managers Accelerate AI Adoption Driven by ML, NLP, and Generative AI

Subscribe to Our Free Newsletter: Get the Latest Fintech...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

How PDI Developed a Robust Enterprise-Grade RAG System for AI Applications...

Transforming Enterprise Knowledge Accessibility: The PDIQ Solution Introduction to PDI Technologies Challenges in Knowledge Accessibility Overview of PDI Intelligence Query (PDIQ) Solution Architecture Process Flow Crawlers Handling Images Document Processing Outcomes and Next...

AI That Mimics Human Thinking: How Close Are We? | Aiiot...

Can AI Truly Think Like a Human? Exploring the Boundaries of Machine Intelligence Understanding What "Thinking Like a Human" Means How Current AI Measures Up The Biggest...

Enhance Creative Asset Discovery with Amazon Nova’s Unified Vector Search for...

Transforming Creative Asset Management in Gaming: Leveraging Amazon Nova Multimodal Embeddings for Enhanced Discoverability This heading encapsulates the main focus of the content while highlighting...