Enhancing User Satisfaction in AI Responses: A Case Study with Amazon Bedrock

Introduction

Improving response quality for user queries is essential for AI-driven applications, especially those focusing on user satisfaction. For example, an HR chat-based assistant should strictly follow company policies and respond using a certain tone. A deviation from that can be corrected by feedback from users. This post demonstrates how Amazon Bedrock, combined with a user feedback dataset and few-shot prompting, can refine responses for higher user satisfaction. By using Amazon Titan Text Embeddings v2, we demonstrate a statistically significant improvement in response quality, making it a valuable tool for applications seeking accurate and personalized responses.

Current Research Insights

Recent studies have highlighted the value of feedback and prompting in refining AI responses. Prompt Optimization with Human Feedback proposes a systematic approach to learning from user feedback, using it to iteratively fine-tune models for improved alignment and robustness. Similarly, Black-Box Prompt Optimization: Aligning Large Language Models without Model Training demonstrates how retrieval augmented chain-of-thought prompting enhances few-shot learning by integrating relevant context, enabling better reasoning and response quality. Building on these ideas, our work uses the Amazon Titan Text Embeddings v2 model to optimize responses using available user feedback and few-shot prompting, achieving statistically significant improvements in user satisfaction.

Practical Implementation Using Amazon Bedrock

We’ve developed a practical solution using Amazon Bedrock that automatically improves chat assistant responses based on user feedback. This solution uses embeddings and few-shot prompting. To demonstrate the effectiveness of the solution, we used a publicly available user feedback dataset. However, when applying it inside a company, the model can use its own feedback data provided by its users. With our test dataset, it shows a 3.67% increase in user satisfaction scores.

Methodology and Benefits of Amazon Bedrock

The following diagram is an overview of the system. The key benefits of using Amazon Bedrock include:

Zero infrastructure management
Cost-effective pricing model
Enterprise-grade security
Seamless integration with existing applications
Access to multiple model options

Conclusion

In this post, we demonstrated the effectiveness of query optimization using Amazon Bedrock, few-shot prompting, and user feedback to significantly enhance response quality. By aligning responses with user-specific preferences, this approach alleviates the need for expensive model fine-tuning, making it practical for real-world applications. Its flexibility makes it suitable for chat-based assistants across various domains, such as eCommerce, customer service, and hospitality, where high-quality, user-aligned responses are essential.

Enhancing AI Response Quality with Amazon Bedrock and User Feedback

In an era dominated by AI-driven applications, refining response quality for user queries is paramount, particularly for applications aimed at maximizing user satisfaction. A chat-based HR assistant, for instance, must adhere strictly to company policies and maintain an appropriate tone. Any deviation can compromise user experience and trust. This is where user feedback becomes invaluable, allowing for iterative improvements. In this blog post, we will explore how Amazon Bedrock, combined with a user feedback dataset and few-shot prompting techniques, can significantly enhance responses, ultimately leading to improved user satisfaction.

The Role of Feedback and Prompting in AI

Recent studies underscore the critical role of feedback and prompting in refining AI responses. For example, "Prompt Optimization with Human Feedback" outlines a systematic strategy for learning from user feedback, which can be used to fine-tune models for improved alignment and robustness. Similarly, "Black-Box Prompt Optimization" shows how retrieval-augmented prompting enhances few-shot learning by integrating relevant contextual knowledge. Building on these concepts, we leverage Amazon Titan Text Embeddings v2 to optimize responses using user feedback and few-shot prompting, achieving significant advancements in user satisfaction. With automatic prompt optimization features already in place within Amazon Bedrock, organizations can easily adapt and enhance prompts without manual intervention.

Solution Overview

We have developed a practical solution using Amazon Bedrock that automatically optimizes chat assistant responses based on user feedback. The system capitalizes on embeddings and few-shot prompting, demonstrating effectiveness with a publicly available user feedback dataset. Within a corporate setting, the model can apply its own dataset of user feedback. Our tests indicated a 3.67% increase in user satisfaction. The key steps are:

Retrieve a publicly available user feedback dataset (e.g., the Unified Feedback Dataset on Hugging Face).
Generate embeddings for queries to capture semantically similar examples using Amazon Titan Text Embeddings.
Utilize similar queries as examples in a few-shot prompt to create optimized prompts.
Compare optimized prompts against direct calls to large language models (LLMs).
Validate improvements in response quality using paired sample t-tests.

Amazon Bedrock Benefits

Utilizing Amazon Bedrock offers several key benefits for organizations:

Zero Infrastructure Management: No need to manage complex machine learning infrastructure; deploy and scale seamlessly.
Cost-Effective: Pay-as-you-go pricing model ensures you only pay for what you use.
Enterprise-Grade Security: Amazon Web Services (AWS) built-in security and compliance features ensure reliability.
Straightforward Integration: Easily integrate existing applications and open-source tools.
Multiple Model Options: Access to diverse foundation models (FMs) suited for various use cases.

Implementation Steps

Prerequisites

Before diving into implementation, ensure you have:

An AWS account with access to Amazon Bedrock.
Python 3.8 or later installed.
Properly configured Amazon credentials.

Data Collection

We downloaded the Unified Feedback Dataset from Hugging Face. This dataset includes fields like conv_A_user (the user query) and conv_A_rating (a binary rating). Below is the code to retrieve and prepare the dataset:

from datasets import load_dataset

# Load the dataset and specify the subset
dataset = load_dataset("llm-blender/Unified-Feedback", "synthetic-instruct-gptj-pairwise")

# Access the 'train' split
train_dataset = dataset["train"]

# Convert to Pandas DataFrame
df = train_dataset.to_pandas()
df['conv_A_user'] = df['conv_A'].apply(lambda x: x[0]['content'] if len(x) > 0 else None)
df['conv_A_assistant'] = df['conv_A'].apply(lambda x: x[1]['content'] if len(x) > 1 else None)
df = df.drop(columns=['conv_A', 'conv_B'])

Data Sampling and Embedding Generation

For effective management, we sampled 6,000 queries and created embeddings using Amazon Titan Text Embeddings v2:

import random
import boto3

# Sample 6,000 queries
df_sampled = df.sample(n=6000, random_state=42)

# Initialize the S3 and Bedrock clients
session = boto3.Session()
region = 'us-east-1'
boto3_bedrock = boto3.client('bedrock-runtime', region)
titan_embed_v2 = BedrockEmbeddings(client=boto3_bedrock, model_id="amazon.titan-embed-text-v2:0")

# Function to convert text to embeddings
def get_embeddings(text):
    response = titan_embed_v2.embed_query(text)
    return response

# Apply embeddings function
df_sampled['conv_A_user_vec'] = df_sampled['conv_A_user'].apply(get_embeddings)

Few-Shot Prompting with Similarity Search

We executed the following steps to optimize prompting:

Sample 100 queries from the dataset for testing.
Compute cosine similarity between test query embeddings and stored embeddings.
Retrieve top k (k=10) similar queries as few-shot examples.

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Function to compute cosine similarity
def compute_cosine_similarity(embedding1, embedding2):
    embedding1 = np.array(embedding1).reshape(1, -1)
    embedding2 = np.array(embedding2).reshape(1, -1)
    return cosine_similarity(embedding1, embedding2)[0][0]

# Retrieve matches for the queries
def get_matched_convo(query, df):
    query_embedding = get_embeddings(query)
    df['similarity'] = df['conv_A_user_vec'].apply(lambda x: compute_cosine_similarity(query_embedding, x))
    top_matches = df.sort_values(by='similarity', ascending=False).head(10)
    return top_matches[['conv_A_user', 'conv_A_assistant','conv_A_rating','similarity']]

Generating Optimized Prompts

With few-shot prompting, we create a tailored set of prompts for user queries while utilizing Amazon Bedrock for the final LLM call:

from langchain_aws import ChatBedrock

# Initialize the Bedrock client
bedrock_runtime = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")
llm = ChatBedrock(client=bedrock_runtime, model_id="us.anthropic.claude-3-5-haiku-20241022-v1:0")

# Function to generate the optimized prompt
def generate_few_shot_prompt_only(user_query, nearest_examples):
    few_shot_prompt = "Here are examples of user queries, LLM responses, and feedback:\n\n"
    for i in range(len(nearest_examples)):
        few_shot_prompt += f"User Query: {nearest_examples.loc[i,'conv_A_user']}\n"
        few_shot_prompt += f"LLM Response: {nearest_examples.loc[i,'conv_A_assistant']}\n"
        few_shot_prompt += f"User Feedback: {'👍' if nearest_examples.loc[i,'conv_A_rating'] == 1.0 else '👎'}\n\n"
    few_shot_prompt += f"Based on these examples, generate a general optimized prompt for the following user query:\n\nUser Query: {user_query}\nOptimized Prompt:"
    return few_shot_prompt

Comparative Evaluation

To assess the effectiveness of optimized prompts, we conducted a series of tests:

Evaluated the responses with and without prompt optimization.
Used LLMs as judges to determine the quality of responses based on user satisfaction scores.

from scipy import stats

# Execute a paired sample t-test
t_stat, p_val = stats.ttest_rel(unopt, opt)
print(f"t-statistic: {t_stat}, p-value: {p_val}")

Result Analysis

After conducting 20 trials, we found the mean satisfaction scores for optimized prompts improved by 3.67%. The p-value of 0.000762 from our t-tests validated the significance of these improvements, reinforcing our methodology’s effectiveness.

Key Takeaways

User Feedback is Essential: Utilizing feedback for prompt optimization leads to a notable enhancement in response quality.
Contextual Similarity Matters: Amazon Titan Text Embeddings facilitate effective similarity searches that improve few-shot learning outcomes.
Statistically Validated Improvements: Analytical validation confirms that our approach tangibly boosts user satisfaction and business impact.
Self-Improvements Are Possible: Leveraging ongoing user feedback enables a continuous learning process, creating a self-improving AI system.

Limitations and Future Steps

While our approach shows promise, its effectiveness depends on the volume and quality of user feedback. Additionally, enhancing the system for multilingual capabilities and employing advanced techniques could further optimize performance. Exploring alternatives for low-feedback scenarios will also be critical for improvement.

Conclusion

In this post, we showcased the powerful combination of Amazon Bedrock, few-shot prompting, and user feedback for significantly enhancing AI response quality. This approach aligns responses with user-specific preferences, alleviating the need for expensive model fine-tuning while keeping the implementation flexible across various industries.

About the Authors

Tanay Chowdhury is a Data Scientist at the Generative AI Innovation Center at Amazon Web Services.

Parth Patwa is a Data Scientist at the Generative AI Innovation Center at Amazon Web Services.

Yingwei Yu is an Applied Science Manager at the Generative AI Innovation Center at Amazon Web Services.

For further resources, feel free to reach out.

Exclusive Content:

Enhancing Query Responses Through User Feedback with Amazon Bedrock Embeddings and Few-Shot Prompting

Enhancing User Satisfaction in AI Responses: A Case Study with Amazon Bedrock

Introduction

Current Research Insights

Practical Implementation Using Amazon Bedrock

Methodology and Benefits of Amazon Bedrock

Conclusion

Enhancing AI Response Quality with Amazon Bedrock and User Feedback

The Role of Feedback and Prompting in AI

Solution Overview

Amazon Bedrock Benefits

Implementation Steps

Prerequisites

Data Collection

Data Sampling and Embedding Generation

Few-Shot Prompting with Similarity Search

Generating Optimized Prompts

Comparative Evaluation

Result Analysis

Key Takeaways

Limitations and Future Steps

Conclusion

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe