Transforming Data Access: The Returns & ReCommerce Data Assist Revolutionizes Analytics with Generative AI

Bridging the Gap between Business Questions and Actionable Insights

Tackling SQL Query Bottlenecks in Large Organizations

Introducing the Returns & ReCommerce Data Assist (RRDA)

Understanding Text-to-SQL: Challenges and Solutions

Architecture Overview of RRDA: Connecting Users with Real-Time Data

Classifying User Intent and Business Domains for Enhanced Accuracy

The Role of Amazon Bedrock in RRDA’s Success

A Comprehensive Metrics Dictionary for Data Consistency

Efficient Table Schema Retrieval for Seamless User Interactions

Smart SQL Generation and Validation: Ensuring Accuracy and Reliability

Creating a Trustworthy User Experience with Transparency

Implementing Best Practices for Text-to-SQL Solutions

Conclusion: From Natural Language to Validated Insights

Meet the Team Behind RRDA: Experts in Data Engineering and AI

Bridging the Data Gap: Transforming Data Access with Generative AI at Amazon

What if you could replace hours of data analysis with a minute-long conversation? This visionary concept is rapidly becoming a reality, thanks to large language models (LLMs) that have the potential to bridge the often wide gap between business questions and actionable data insights. In many organizations, this gap leads to a plethora of challenges, with business teams spending excessive time attempting to decode metric definitions and hunting for the right data sources to craft SQL queries manually. What should be straightforward queries often evolve into multi-day ordeals, with analytics teams bogged down by routine requests and unable to focus on more strategic initiatives.

The Challenge: Bottlenecking in Data Accessibility

The Amazon Worldwide Returns & ReCommerce (WWRR) organization faced complex challenges stemming from the sheer volume of data queries. With users executing over 450,000 SQL queries annually against petabyte-scale Amazon Redshift data warehouses, the analytics teams found themselves trapped in a bottleneck. We knew we needed a self-service solution capable of not only addressing these enterprise complexities but also managing thousands of interconnected tables, proprietary business terminologies, and evolving metric definitions that vary across different business domains—all while complying with strict governance requirements.

Introducing the Returns & ReCommerce Data Assist (RRDA)

To address these pressing challenges, we developed the Returns & ReCommerce Data Assist (RRDA)—a generative AI-powered conversational interface that fundamentally transformed data access for over 4,000 non-technical users. RRDA enables users to identify correct metrics, construct validated SQL, and generate complex visualizations, all through natural conversation. The results speak for themselves: we achieved 90% faster query resolution, reducing what used to take hours down to mere minutes, with zero dependency on business intelligence teams.

This blog post series will outline the development journey of RRDA, starting with how we constructed a Text-to-SQL solution using Amazon Bedrock, a managed service designed for building generative AI applications. In Part 1, we’ll delve into the capabilities of Amazon Bedrock Agents, and Part 2 will demonstrate how we extended RRDA to provide insightful business visualizations using Amazon Q in QuickSight.

Understanding Text-to-SQL Systems

Text-to-SQL systems play a crucial role by transforming natural language questions into structured database queries through a multi-step process:

Understanding User Intent: The system must first comprehend what the user is asking.
Entity Extraction: Extracting key entities from the natural language input is vital for effective translation.
Schema Matching: The identified entities are matched with components in the database schema (tables, columns, relationships).
SQL Generation: Finally, the system generates syntactically correct SQL that accurately represents the user’s query.

The Architecture of RRDA: A Dual Processing Pathway

At its core, RRDA utilizes a WebSocket connection through Amazon API Gateway, integrating AWS Lambda for real-time serverless processing. The architecture employs two primary pathways:

SQL Generation Pathway: For user intents focused on querying, the system checks a semantic cache for verified responses before routing requests to an Amazon Bedrock agent.
Visualization Requests Pathway: For visualization queries, the system retrieves relevant topics from Amazon QuickSight and presents auto-generated visualizations.

Intent and Domain Classification

We categorize incoming messages based on user intent and relevant business domain. Here’s how we classify them:

INFO: For informational questions about metrics and schemas.
QUERY: For SQL query generation.
SHOW_METRIC: For visualization requests.
UNKNOWN: For unrecognized queries.

The use of a lightweight foundation model through Amazon Bedrock Converse API enables us to analyze user messages alongside recent conversation histories, ensuring accurate classification.

The Smart Agent: Amazon Bedrock

The intelligent agent at the heart of RRDA is powered by Anthropic’s Claude 3.5 Haiku. This agent orchestrates data retrieval and SQL generation by engaging in various action groups:

RetrieveFromKnowledgeBase: Searches for domain-specific metric definitions and their calculation logic.
FetchTableSchema: Retrieves up-to-date table schema and example queries from our metadata store.
GenerateSQLQuery: Delegates SQL generation tasks to Anthropic’s Claude 3.7 Sonnet, which excels in creating complex SQL statements.

Generating and Validating SQL Queries

Generating and validating SQL queries is a critical function of RRDA. Unlike traditional methods, which may generate SQL in one shot, our approach takes a smarter, conversational route:

The agent leverages its knowledge base to identify the correct metric and data source.
It fetches the appropriate table schema to understand the data structure.
The GenerateSQLQuery action group invokes Claude for code generation.
SQL validation occurs using Redshift’s EXPLAIN command, which checks for syntax errors without executing the query, thus maintaining security boundaries.

This intelligent process ensures users receive syntactically correct SQL queries that are ready to execute without any SQL expertise or database schema knowledge.

Building User Trust Through Transparency

To build confidence among users, we emphasizing a transparent interface where users can verify the reasoning processes of RRDA. Features like validation status badges and a question bank of verified answers enhance user experience by providing clear feedback on query accuracy and effectiveness.

Best Practices for Implementation

Based on our experiences, we recommend several best practices when implementing intelligent Text-to-SQL solutions:

Implement Domain-Aware Context: Ensure that SQL generation respects varied metric calculations across business units.
Adopt a Hybrid Model Architecture: This balances performance and accuracy by using smaller models for user interaction and larger models for complex tasks.
Validate Without Execution: Use lightweight commands like Redshift EXPLAIN for real-time SQL validation.
Automate Metadata Maintenance: Build automated pipelines to keep schema information current.
Design for Transparency: Use status indicators and clear explanations to build trust in the output.

Conclusion

The Returns & ReCommerce Data Assist (RRDA) is transforming how we access data at WWRR, enabling business users to retrieve accurate data without SQL expertise. By converting natural language queries into validated SQL through Amazon Bedrock and our domain-aware approach, we are drastically shortening the path from questions to insights. Stay tuned for Part 2, where we will explore automating visualizations with Amazon Q in QuickSight.

Meet the Authors

The authors of this post include a team of engineers within Amazon’s Worldwide Returns and ReCommerce Data Services team, each specializing in large language models, data infrastructure, and advanced analytics. Together, they are at the forefront of harnessing generative AI to revolutionize data accessibility in enterprise environments.

With RRDA, the future of data access is bright. Join us on this exciting journey as we continue to innovate and empower users with intelligent data solutions!

Exclusive Content:

Creating a Conversational Data Assistant: Part 1 – Text-to-SQL Using Amazon Bedrock Agents

Transforming Data Access: The Returns & ReCommerce Data Assist Revolutionizes Analytics with Generative AI

Bridging the Gap between Business Questions and Actionable Insights

Tackling SQL Query Bottlenecks in Large Organizations

Introducing the Returns & ReCommerce Data Assist (RRDA)

Understanding Text-to-SQL: Challenges and Solutions

Architecture Overview of RRDA: Connecting Users with Real-Time Data

Classifying User Intent and Business Domains for Enhanced Accuracy

The Role of Amazon Bedrock in RRDA’s Success

A Comprehensive Metrics Dictionary for Data Consistency

Efficient Table Schema Retrieval for Seamless User Interactions

Smart SQL Generation and Validation: Ensuring Accuracy and Reliability

Creating a Trustworthy User Experience with Transparency

Implementing Best Practices for Text-to-SQL Solutions

Conclusion: From Natural Language to Validated Insights

Meet the Team Behind RRDA: Experts in Data Engineering and AI

Bridging the Data Gap: Transforming Data Access with Generative AI at Amazon

The Challenge: Bottlenecking in Data Accessibility

Introducing the Returns & ReCommerce Data Assist (RRDA)

Understanding Text-to-SQL Systems

The Architecture of RRDA: A Dual Processing Pathway

Intent and Domain Classification

The Smart Agent: Amazon Bedrock

Generating and Validating SQL Queries

Building User Trust Through Transparency

Best Practices for Implementation

Conclusion

Meet the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe