Unlocking the Power of Conversational Data Access with Amazon Bedrock Knowledge Bases

Streamlining Structured Data Retrieval for Enhanced Decision-Making

Introduction

Solution Overview

Setting Up the Data Ingestion Pipeline

Configuring the Structured Data Retrieval Solution

Code Resources and Templates

Prerequisites

Clean Up

Conclusion

About the Authors

Unlocking Data Insights with Amazon Bedrock Knowledge Bases

In today’s data-centric world, organizations manage extensive structured data across various databases and data warehouses. However, even as large language models (LLMs) have revolutionized natural language processing (NLP), the task of converting conversational queries into structured data analysis remains an intricate challenge. Frequently, data analysts find themselves burdened with translating business questions into SQL queries, resulting in workflow bottlenecks that impede timely decision-making.

This is where Amazon Bedrock Knowledge Bases comes into play. By enabling direct natural language interactions with structured data sources, it interprets database schemas and context, transforming natural language questions into accurate queries while upholding data reliability standards. With a few easy steps, you can set up structured data ingestion from AWS Glue Data Catalog tables and Amazon Redshift clusters, harnessing the capabilities of Amazon Bedrock Knowledge Bases for structured data retrieval.

Solution Overview

This blog post presents a comprehensive guide to configuring a structured data retrieval solution using Amazon Bedrock Knowledge Bases. Developers often face challenges integrating structured data into generative AI applications, such as:

Training LLMs to convert natural language queries to SQL queries based on complex database schemas.
Ensuring appropriate data governance and security controls.

Amazon Bedrock Knowledge Bases alleviates these complexities by providing a managed natural language to SQL (NL2SQL) module. It offers an end-to-end managed workflow for building custom generative AI applications that access and utilize contextual information from various structured and unstructured data sources. Using advanced NLP, Amazon Bedrock Knowledge Bases transforms natural language queries into SQL queries, allowing you to retrieve data directly from the source without requiring data movement or preprocessing.

Solution Architecture

The architecture consists of two primary components:

Data Ingestion Pipeline: A one-time setup that supports multiple ingestion options, allowing seamless integration of S3 datasets and Data Catalog tables into your Retrieval Augmented Generation (RAG) applications while maintaining access permissions through Lake Formation.
Structured Data Retrieval Application: Utilizing Amazon Bedrock Knowledge Bases, Amazon Redshift serves as the query engine, enabling users to query their structured data with intuitive conversational prompts.

Note: Insert diagram illustrating the pipeline here.

Once configured, users can pose natural language questions, and Amazon Bedrock Knowledge Bases will generate SQL queries, execute them against the database, and process the output to deliver user-friendly responses.

Data Retrieval Workflow Steps

User Input: A user submits a natural language data analytics question via the chat interface, e.g., “What is the sales revenue for February 2025?”
NL2SQL Processing: The query is routed to Amazon Bedrock Knowledge Bases for processing.
SQL Query Generation: Amazon Bedrock Knowledge Bases generates a SQL query based on the underlying data schema.
Data Retrieval: The SQL query executes against Amazon Redshift, retrieving data from AWS Glue tables, including complex joins and aggregations.
Response Formation: The SQL response is sent to an LLM with additional context to generate a natural language reply.
User Interaction: Users can ask follow-up questions based on the initial response, such as “What is the product that generated the highest revenue during this period?”

API Options

Amazon Bedrock Knowledge Bases structured data retrieval supports three different APIs:

Retrieval and Response Generation: Generates a SQL query, retrieves data, and processes it through an LLM to return a natural language response.
Retrieval Only: Generates a SQL query and retrieves data without LLM processing.
Generate SQL Queries: Returns the raw SQL query for review and further processing.

Code Resources and Templates

In order to implement the solution, we provide the following notebooks:

Data Ingestion Notebook: structured-rag-s3-glue-ingestion guides you on ingested open datasets to Amazon S3, configuring AWS Glue tables with crawlers, and setting up Amazon Redshift as your query engine.
Structured Data Retrieval Notebook: structured-rag-s3-glue-retrieval walks through the implementation steps and includes sample code for configuring Amazon Bedrock Knowledge Bases structured data retrieval.

For detailed resources, refer to the GitHub repository.

Prerequisites

An AWS account
Enabled access to the necessary foundation models in Amazon Bedrock

Setting Up the Data Ingestion Pipeline

The setup involves loading a sample dataset into an S3 bucket and configuring AWS Glue as storage while using Redshift for queries. Complete the following steps in the data ingestion notebook:

Download a sample ecommerce dataset, convert it to a pandas data frame, and upload it to your S3 bucket with Amazon SageMaker Data Wrangler.
Create an AWS Glue database and table using a Glue crawler that crawls the source S3 bucket.
Set up a Redshift Serverless namespace and workgroup in the default VPC, or use your existing setups.

Configuring the Structured Data Retrieval Solution

Amazon Bedrock Knowledge Bases allows several data access patterns, focusing on IAM integration for the post. Complete the steps in the structured data retrieval notebook:

Create an execution role with policies for accessing data in Redshift, AWS Glue, and the S3 bucket.
Invoke the CreateKnowledgeBase API to set up the knowledge base with the execution role and configurations.
Ensure the execution role has necessary permissions in Redshift and AWS Glue.

Clean Up

Cleanup instructions are included in both notebooks to remove resources post-validation of the end-to-end solution.

Conclusion

Amazon Bedrock Knowledge Bases simplifies data analysis by translating natural language questions into SQL queries, freeing users from needing specialized database knowledge. It serves as a powerful bridge between users and data, maintaining security through integrated governance controls. By deploying this managed service, business analysts, data scientists, and operational teams can now query data effortlessly with natural dialogue, significantly enhancing their data access capabilities.

For further information, explore Building a knowledge base by connecting to a structured data store and how Amazon Bedrock Knowledge Bases supports structured data retrieval.

About the Authors

George Belsian is a Senior Cloud Application Architect at AWS, guiding organizations in cloud adoption and AI integration.

Sandeep Singh is a Senior Generative AI Data Scientist, specializing in generating AI solutions for diverse industries.

Mani Khanuja is a Principal Generative AI Specialist, an influential speaker and author in AI applications.

Gopikrishnan Anilkumar is a Principal Technical Product Manager with extensive experience in AI/ML product management.

By embracing solutions like Amazon Bedrock Knowledge Bases, organizations are poised to unlock the full potential of their data assets for informed strategic decisions.

Exclusive Content:

Create Conversational Interfaces for Structured Data with Amazon Bedrock Knowledge Bases

Unlocking the Power of Conversational Data Access with Amazon Bedrock Knowledge Bases

Streamlining Structured Data Retrieval for Enhanced Decision-Making

Introduction

Solution Overview

Setting Up the Data Ingestion Pipeline

Configuring the Structured Data Retrieval Solution

Code Resources and Templates

Prerequisites

Clean Up

Conclusion

About the Authors

Unlocking Data Insights with Amazon Bedrock Knowledge Bases

Solution Overview

Solution Architecture

Data Retrieval Workflow Steps

API Options

Code Resources and Templates

Prerequisites

Setting Up the Data Ingestion Pipeline

Configuring the Structured Data Retrieval Solution

Clean Up

Conclusion

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe