Optimizing Machine Learning Infrastructure with OLAF and Amazon SageMaker
A Collaborative Journey with Aashraya Sachdeva from Observe.ai
Leveraging SageMaker for Efficient ML Development
The Challenge: Managing Scale and Cost in ML Operations
Introducing the One Load Audit Framework (OLAF)
Solution Overview: Enhancing Performance through Load Testing
Prerequisites for Setting Up OLAF
Generating AWS Credentials with AWS STS
Setting Up Your SageMaker Inference Endpoint
Installing and Configuring OLAF
Conducting Load Tests on the SageMaker Endpoint
Impact of Hosting Environment on Load Testing
Clean-Up: Managing Resources Post-Demonstration
Conclusion: Streamlining ML Operations with OLAF
About the Authors
Optimizing ML Infrastructure with OLAF on Amazon SageMaker
Co-written with Aashraya Sachdeva from Observe.ai.
In today’s fast-paced tech landscape, building, training, and deploying machine learning (ML) models has become integral to driving business success. Amazon SageMaker offers a robust environment for these tasks, allowing data science and ML engineering teams to efficiently develop applications—especially in the realm of generative AI and large language models (LLMs).
However, while SageMaker significantly reduces the heavy lifting often associated with model development, engineering teams still face challenges. Manual configuration of inference pipeline services, including queues and databases, can slow down the deployment process, and testing multiple GPU instance types to balance performance and cost adds to the complexity.
Meet Observe.ai and its Conversation Intelligence (CI) Tool
Observe.ai specializes in Conversation Intelligence solutions that enhance contact center operations. Their platform processes calls in real time, enabling features like summarization, agent feedback, and auto-responses. As their user base grows—from fewer than 100 agents to thousands—scalability becomes vital. To keep pace, they needed an efficient way to refine their ML infrastructure while minimizing costs.
Enter the One Load Audit Framework (OLAF)
To address this challenge, Observe.ai created the One Load Audit Framework (OLAF), seamlessly integrating with SageMaker to provide insights into bottlenecks and performance issues. By measuring latency and throughput under varying data loads, OLAF facilitates efficient model testing. This innovation reduced Observe.ai’s testing time from a week to mere hours, enabling rapid deployment and onboarding.
Using OLAF to Optimize Your SageMaker Endpoint
In this blog post, we’ll explore how to leverage OLAF to test and validate your SageMaker endpoint effectively.
Solution Overview
After deploying your ML model, load testing is essential for optimizing performance. This involves configuring scripts that interact with SageMaker APIs to gather metrics on latency, CPU, and memory utilization. OLAF simplifies this process by packaging these necessary elements together:
- Integration with Locust for concurrent load generation.
- A dashboard for real-time performance monitoring.
- Automated metric extraction from SageMaker APIs.
Prerequisites for OLAF
You’ll need the following to get started:
- An AWS account
- Docker installed on your workstation
- The AWS Command Line Interface (CLI) configured
Generate AWS Credentials Using AWS STS
Using the AWS CLI, generate temporary credentials with the appropriate permissions for Amazon SageMaker. Ensure your role has the AmazonSageMakerFullAccess permission.
aws sts assume-role --role-arn <your-role-arn> --role-session-name olaf_session --duration-seconds 1800
Take note of the access key, secret key, and session token you’ll use later in the OLAF configuration.
Setting Up Your SageMaker Inference Endpoint
Deploy your SageMaker inference endpoint using a CloudFormation script. Save your configuration settings in a YAML file and upload it through CloudShell.
Resources:
SageMakerExecutionRole:
Type: AWS::IAM::Role
Properties:
# Add your service role configuration here
SageMakerModel:
Type: AWS::SageMaker::Model
# Model properties
SageMakerEndpointConfig:
Type: AWS::SageMaker::EndpointConfig
# Endpoint configuration
SageMakerEndpoint:
Type: AWS::SageMaker::Endpoint
# Endpoint settings
Run the command to create the stack and provision the resources.
Installing OLAF
Clone the OLAF repository and build the Docker image:
git clone https://github.com/Observeai-Research/olaf.git
cd olaf
docker build -t olaf .
docker run -p 80:8000 olaf
Access the OLAF UI at http://localhost:80 using the credentials olaf/olaf.
Testing the SageMaker Endpoint
In the OLAF interface, configure your SageMaker test parameters, including:
- Endpoint name
- Predictor type
- Input/Output serialization formats
- AWS credentials
Initiate the load test by specifying the number of concurrent users and observing the performance metrics in real time via the Locust dashboard.
Hosting the Client and Final Thoughts
Your testing environment can impact latency. It’s essential to standardize your setup based on real customer usage to garner accurate insights.
As you conclude your tests, remember to clean up resources to avoid unnecessary costs:
aws cloudformation delete-stack --stack-name flan-t5-endpoint-stack
Conclusion
In this post, we explored how OLAF can dramatically streamline the load testing of SageMaker endpoints, offering significant time savings and insights into optimizing ML infrastructure. OLAF addresses challenges faced by organizations like Observe.ai, freeing development teams to focus on product features while ensuring high-performance and cost-effective ML operations.
For further exploration, check out the OLAF framework on GitHub and leverage its capabilities to enhance your SageMaker deployments effectively.
About the Authors
Aashraya Sachdeva is a Director of Engineering at Observe.ai, overseeing scalable solutions that enhance both customer experience and operational efficiency.
Shibu Jacob is a Senior Solutions Architect at AWS, specialized in cloud-native architectures and the transformative potential of AI.
With frameworks like OLAF, optimizing ML operations is no longer a complex maze but a structured pathway towards innovation and efficiency.