Exciting Announcement: Multimodal Retrieval Now Available for Amazon Bedrock Knowledge Bases
Unlocking New Possibilities with Native Support for Video and Audio Content
Streamlining AI Applications Across Multiple Content Types
Enhancing Search Capabilities: A Shift from Text-First Approaches
Introducing Unified Workflows for Text, Image, Audio, and Video Retrieval
Understanding Multimodal Knowledge Bases
How Amazon Bedrock Automates the Retrieval Workflow
Exploring Amazon Nova Multimodal Embeddings
Utilizing Bedrock Data Automation for Rich Text Descriptions
Use Case Scenario: Visual Product Search for E-Commerce
Console Walkthrough: Setting Up a Multimodal Knowledge Base
Prerequisites for Getting Started
Configuring Your Data Source and Knowledge Base
Storing and Processing Data for Optimal Retrieval
Testing Your Knowledge Base with Various Input Types
Utilizing Bedrock Data Automation for Enhanced Context
Cleanup Procedures after Testing
Conclusion: Simplifying RAG Applications Across Media Types
Availability of Features by Region
Next Steps: Explore, Experiment, Learn, and Grow
About the Authors
Unleashing the Power of Multimodal Retrieval with Amazon Bedrock Knowledge Bases
We are thrilled to announce the general availability of multimodal retrieval for Amazon Bedrock Knowledge Bases. This cutting-edge capability now adds native support for video and audio content in addition to text and images. With this update, you can create Retrieval Augmented Generation (RAG) applications that seamlessly search and retrieve information across various media types—all within a fully managed service.
Why Multimodal Retrieval Matters
Modern enterprises store critical information in multiple formats, from product documentation featuring diagrams and screenshots to training materials that include instructional videos and customer insights captured in recorded meetings. Historically, building AI applications capable of efficiently searching across these content types required intricate custom infrastructure and considerable engineering effort.
Previously, Bedrock Knowledge Bases relied on text-based embedding models for retrieval. While this approach supported text documents and images, images had to be converted into text descriptions, which often resulted in loss of visual context and limited visual search capabilities.
Enter Multimodal Embeddings
With this new feature, Bedrock Knowledge Bases now supports native embeddings, allowing for effective search across text, images, audio, and video—all contained within a single embedding model. This approach preserves the context of various media types, enabling users to query with images to find visually similar content or locate specific scenes in videos.
Overview of Multimodal Knowledge Bases
Amazon Bedrock Knowledge Bases automates the entire RAG workflow: it ingests content from data sources, parses, chunks, and converts it into searchable segments, and stores these in a vector database. When a user query is submitted, it’s embedded and matched against stored vectors to find semantically similar content, enriching the prompt sent to your foundation model.
Building Multimodal RAG Applications
In our upcoming series, we will walk you through the process of creating multimodal RAG applications. This involves understanding how multimodal knowledge bases function, selecting the appropriate processing strategy based on content type, and configuring multimodal retrieval using both the AWS Console and code examples.
Amazon Nova Multimodal Embeddings
The Amazon Nova Multimodal Embeddings model is the first of its kind, unifying text, documents, images, video, and audio into a shared vector space without requiring text conversion. This cutting-edge model supports up to 8,172 tokens for text and 30 seconds of video/audio segments across 200+ languages.
The model captures both visual and audio characteristics, allowing for multifaceted retrieval capabilities—enabling you to search the visual elements of a scene and retrieve matching videos.
Bedrock Data Automation
Alternatively, Bedrock Data Automation converts multimedia content into rich textual representations before embedding. This text-first strategy is particularly beneficial in compliance scenarios where exact quotes and verbatim records are essential. It allows for highly accurate retrieval over spoken content, making it invaluable for applications such as customer support calls, training videos, and webinars.
Practical Applications: Visual Product Search
Consider an e-commerce scenario where traditional text searches fall short. Customers often struggle to articulate their desires with the right keywords. Now, with multimodal retrieval, they can simply upload an image or reference a video scene, allowing the system to retrieve visually similar items through embedded representations.
This change transforms the shopping experience, making product discovery not only accessible but intuitive.
A Quick Guide to Setup
1. Prerequisites
Ensure you have the necessary prerequisites for setting up.
2. Create a Knowledge Base
Open the Amazon Bedrock console and create a new knowledge base. You’ll need to provide details regarding the data source type.
3. Data Configuration
Connect your S3 bucket containing product images and videos. Choose the appropriate parsing strategy and select Amazon Nova Multimodal Embeddings for unified processing.
4. Storage and Processing
Designate Amazon S3 Vectors as the vector store for optimized query performance.
5. Ingestion and Testing
After creating the knowledge base, start the ingestion process. Test your knowledge base using both text and image queries to ensure that the retrieval works seamlessly across different types.
Conclusion
The introduction of multimodal retrieval for Amazon Bedrock Knowledge Bases simplifies the development of RAG applications that transcend the limitations of text-based searches. With integrated support for video and audio, businesses can now leverage a comprehensive knowledge base that brings insights from diverse formats to the forefront.
For a deeper dive, check out the Amazon Bedrock Knowledge Bases documentation and explore hands-on code examples in the Amazon Bedrock samples repository to unlock the full potential of this powerful tool.
Next Steps
- Explore the Documentation: Familiarize yourself with the intricacies of Amazon Bedrock Knowledge Bases.
- Experiment with Code Examples: Access informative notebooks for practical insights.
- Learn More About Nova: Gain further technical details on Amazon Nova Multimodal Embeddings.
About the Authors
Dani Mitchell is a Generative AI Specialist Solutions Architect at Amazon Web Services (AWS), dedicated to accelerating enterprises on their generative AI journeys.
Pallavi Nargund is a Principal Solutions Architect at AWS, leading the generative AI initiative for US Greenfield.
Jean-Pierre Dodel serves as a Principal Product Manager focusing on innovations for multimodal RAG and has extensive experience in Enterprise Search and AI/ML.
Embrace the future of information retrieval with Amazon Bedrock Knowledge Bases and unlock new opportunities within your enterprise!