Efficient Management of Large-Scale Inference with Amazon Bedrock

Introduction to Batch Inference Workflows

Solution Overview

Prerequisites for Deployment

Steps to Deploy the Solution

Job Input Structure

Working with Hugging Face Datasets

Uploading Data to Amazon S3

Generating Batch Embeddings

Understanding the Step Functions Workflow

Clean Up Instructions

Conclusion and Future Prospects

About the Authors

Streamlining Large-Scale Inference with Amazon Bedrock Batch Processing

As organizations increasingly leverage foundation models (FMs) for artificial intelligence (AI) and machine learning (ML) workloads, efficient management of large-scale inference operations is becoming pivotal. Amazon Bedrock stands out by providing two primary inference patterns: real-time inference and batch inference. The latter is particularly advantageous for processing extensive datasets when immediate results are not a priority.

Cost-Effective Batch Inference with Amazon Bedrock

Amazon Bedrock’s batch inference offers a cost-effective solution, reducing processing costs by 50% compared to on-demand options. This makes it ideal for high-volume, time-insensitive workloads. However, scaling batch inference operations presents challenges, including managing input formats, adhering to job quotas, orchestrating concurrent executions, and performing post-processing tasks. To address these complexities, a robust solution is essential.

A Scalable Solution for Batch Inference

In this post, we introduce a flexible, scalable solution that enhances the batch inference workflow. Our approach simplifies managing FM batch inference, whether generating embeddings for millions of documents or executing custom evaluation and completion tasks on large datasets.

Solution Overview

Our automated workflow consists of three main phases:

Preprocessing Input Datasets: Transforming data into the required format, such as prompt formatting.
Executing Batch Inference Jobs: Running jobs in parallel for maximum efficiency.
Post-processing Outputs: Parsing model responses to attain useful insights.

By utilizing Amazon Step Functions within an AWS Cloud Development Kit (AWS CDK) stack, we streamline these operational phases, allowing for seamless orchestration of batch jobs.

Use Case: The SimpleCoT Dataset

For our demonstration, we utilize a dataset from SimpleCoT, containing 2.2 million rows of task-oriented examples aimed at enhancing chain-of-thought (CoT) reasoning in language models. This diverse dataset addresses various challenges, including reading comprehension, mathematical reasoning, and natural language processing.

Architectural Considerations

To effectively manage batch processing workflows with Amazon Bedrock, our architecture incorporates scalable serverless components that address key considerations:

Input File Format & Storage: Job inputs must be structured as JSON Lines (JSONL) files stored in an Amazon S3 bucket, ensuring compatibility with the API request structure for each FM provider.
Step Functions State Machine: This robust orchestration tool coordinates asynchronous, long-running jobs. Using Amazon DynamoDB, we maintain an inventory of job states while adhering to quota limits on jobs in progress.
Postprocessing Mechanisms: AWS Lambda functions handle parsing and joining outputs to the original input data after batch results are available.

Implementation Steps

Prerequisites

Before deploying the solution, ensure you have:

Node.js and npm installed.
The AWS CDK set up.

Clone the GitHub repository:

git clone https://github.com/aws-samples/amazon-bedrock-samples
cd poc-to-prod/bedrock-batch-orchestrator

Deployment

Install the required packages:

npm i

In the prompt_templates.py file, configure a new prompt template for your use case, ensuring your input dataset aligns with the formatting keys.

Deploy the AWS CDK stack:

npm run cdk deploy

Take note of the outputs, which will include information about the workflow and S3 bucket created:

✅ BedrockBatchOrchestratorStack

✨ Deployment time: 23.16s
Outputs:
BedrockBatchOrchestratorStack.bucketName = batch-inference-bucket-
BedrockBatchOrchestratorStack.stepFunctionName = bedrockBatchOrchestratorSfnE5E2B976-4yznxekguxxm

Job Input Structure

You can either use a Hugging Face dataset ID or reference an Amazon S3 dataset. For Hugging Face datasets, reference the required dataset ID and split to pull data directly from Hugging Face Hub. For S3 datasets, ensure the file structure aligns with the model requirements.

Generate Batch Embeddings

For embedding generation, ensure your input CSV file includes a column labeled input_text. The structure resembles:

{
  "s3_uri": "s3://batch-inference-bucket-/inputs/embeddings/embedding_input.csv",
  "job_name_prefix": "test-embeddings-job1",
  "model_id": "amazon.titan-embed-text-v2:0",
  "prompt_id": null
}

Step Functions Workflow

The Step Functions workflow processes your jobs through several stages, including preparing inputs, orchestrating jobs, and concurrent post-processing to merge model responses back with the original data. Monitoring the workflow provides insights into job status and resource utilization.

Conclusion

In this post, we’ve explored a serverless architecture using Amazon Bedrock for large-scale batch processing. This solution is versatile for various use cases beyond inference, including large-scale data labeling and embedding generation.

The solution is publicly available in the GitHub repository, and we encourage you to implement this architecture to unlock new possibilities in your AI/ML endeavors.

Meet the Authors

Swagat Kulkarni: Senior Solutions Architect at AWS, passionate about cloud-native services and innovative AI solutions.
Evan Diewald: Data & ML Engineer, dedicated to developing and deploying ML solutions across various industries.
Shreyas Subramanian: Principal Data Scientist, specializing in generative AI and deep learning, with a rich background in cutting-edge research.

We look forward to seeing how you leverage this architecture for your projects!

Exclusive Content:

Create a Serverless Workflow for Amazon Bedrock Batch Job Orchestration with AWS Step Functions

Efficient Management of Large-Scale Inference with Amazon Bedrock

Introduction to Batch Inference Workflows

Solution Overview

Prerequisites for Deployment

Steps to Deploy the Solution

Job Input Structure

Working with Hugging Face Datasets

Uploading Data to Amazon S3

Generating Batch Embeddings

Understanding the Step Functions Workflow

Clean Up Instructions

Conclusion and Future Prospects

About the Authors

Streamlining Large-Scale Inference with Amazon Bedrock Batch Processing

Cost-Effective Batch Inference with Amazon Bedrock

A Scalable Solution for Batch Inference

Solution Overview

Use Case: The SimpleCoT Dataset

Architectural Considerations

Implementation Steps

Prerequisites

Deployment

Job Input Structure

Generate Batch Embeddings

Step Functions Workflow

Conclusion

Meet the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe