Efficient Management of Large-Scale Inference with Amazon Bedrock
Introduction to Batch Inference Workflows
Solution Overview
Prerequisites for Deployment
Steps to Deploy the Solution
Job Input Structure
Working with Hugging Face Datasets
Uploading Data to Amazon S3
Generating Batch Embeddings
Understanding the Step Functions Workflow
Clean Up Instructions
Conclusion and Future Prospects
About the Authors
Streamlining Large-Scale Inference with Amazon Bedrock Batch Processing
As organizations increasingly leverage foundation models (FMs) for artificial intelligence (AI) and machine learning (ML) workloads, efficient management of large-scale inference operations is becoming pivotal. Amazon Bedrock stands out by providing two primary inference patterns: real-time inference and batch inference. The latter is particularly advantageous for processing extensive datasets when immediate results are not a priority.
Cost-Effective Batch Inference with Amazon Bedrock
Amazon Bedrock’s batch inference offers a cost-effective solution, reducing processing costs by 50% compared to on-demand options. This makes it ideal for high-volume, time-insensitive workloads. However, scaling batch inference operations presents challenges, including managing input formats, adhering to job quotas, orchestrating concurrent executions, and performing post-processing tasks. To address these complexities, a robust solution is essential.
A Scalable Solution for Batch Inference
In this post, we introduce a flexible, scalable solution that enhances the batch inference workflow. Our approach simplifies managing FM batch inference, whether generating embeddings for millions of documents or executing custom evaluation and completion tasks on large datasets.
Solution Overview
Our automated workflow consists of three main phases:
- Preprocessing Input Datasets: Transforming data into the required format, such as prompt formatting.
- Executing Batch Inference Jobs: Running jobs in parallel for maximum efficiency.
- Post-processing Outputs: Parsing model responses to attain useful insights.
By utilizing Amazon Step Functions within an AWS Cloud Development Kit (AWS CDK) stack, we streamline these operational phases, allowing for seamless orchestration of batch jobs.
Use Case: The SimpleCoT Dataset
For our demonstration, we utilize a dataset from SimpleCoT, containing 2.2 million rows of task-oriented examples aimed at enhancing chain-of-thought (CoT) reasoning in language models. This diverse dataset addresses various challenges, including reading comprehension, mathematical reasoning, and natural language processing.
Architectural Considerations
To effectively manage batch processing workflows with Amazon Bedrock, our architecture incorporates scalable serverless components that address key considerations:
- Input File Format & Storage: Job inputs must be structured as JSON Lines (JSONL) files stored in an Amazon S3 bucket, ensuring compatibility with the API request structure for each FM provider.
- Step Functions State Machine: This robust orchestration tool coordinates asynchronous, long-running jobs. Using Amazon DynamoDB, we maintain an inventory of job states while adhering to quota limits on jobs in progress.
- Postprocessing Mechanisms: AWS Lambda functions handle parsing and joining outputs to the original input data after batch results are available.
Implementation Steps
Prerequisites
Before deploying the solution, ensure you have:
- Node.js and npm installed.
- The AWS CDK set up.
Clone the GitHub repository:
git clone https://github.com/aws-samples/amazon-bedrock-samples
cd poc-to-prod/bedrock-batch-orchestrator
Deployment
Install the required packages:
npm i
In the prompt_templates.py
file, configure a new prompt template for your use case, ensuring your input dataset aligns with the formatting keys.
Deploy the AWS CDK stack:
npm run cdk deploy
Take note of the outputs, which will include information about the workflow and S3 bucket created:
✅ BedrockBatchOrchestratorStack
✨ Deployment time: 23.16s
Outputs:
BedrockBatchOrchestratorStack.bucketName = batch-inference-bucket-
BedrockBatchOrchestratorStack.stepFunctionName = bedrockBatchOrchestratorSfnE5E2B976-4yznxekguxxm
Job Input Structure
You can either use a Hugging Face dataset ID or reference an Amazon S3 dataset. For Hugging Face datasets, reference the required dataset ID and split to pull data directly from Hugging Face Hub. For S3 datasets, ensure the file structure aligns with the model requirements.
Generate Batch Embeddings
For embedding generation, ensure your input CSV file includes a column labeled input_text
. The structure resembles:
{
"s3_uri": "s3://batch-inference-bucket-/inputs/embeddings/embedding_input.csv",
"job_name_prefix": "test-embeddings-job1",
"model_id": "amazon.titan-embed-text-v2:0",
"prompt_id": null
}
Step Functions Workflow
The Step Functions workflow processes your jobs through several stages, including preparing inputs, orchestrating jobs, and concurrent post-processing to merge model responses back with the original data. Monitoring the workflow provides insights into job status and resource utilization.
Conclusion
In this post, we’ve explored a serverless architecture using Amazon Bedrock for large-scale batch processing. This solution is versatile for various use cases beyond inference, including large-scale data labeling and embedding generation.
The solution is publicly available in the GitHub repository, and we encourage you to implement this architecture to unlock new possibilities in your AI/ML endeavors.
Meet the Authors
- Swagat Kulkarni: Senior Solutions Architect at AWS, passionate about cloud-native services and innovative AI solutions.
- Evan Diewald: Data & ML Engineer, dedicated to developing and deploying ML solutions across various industries.
- Shreyas Subramanian: Principal Data Scientist, specializing in generative AI and deep learning, with a rich background in cutting-edge research.
We look forward to seeing how you leverage this architecture for your projects!