Introducing Managed Node Auto Scaling for Amazon SageMaker HyperPod with Karpenter

Unlocking Dynamic Scalability for Machine Learning Workloads

New Features and Benefits of Karpenter Integration

Solution Overview

Prerequisites for Implementing Karpenter

Step-by-Step: Create and Configure Your SageMaker HyperPod Cluster

Defining HyperpodNodeClass for Resource Management

Setting Up a NodePool for Optimal Node Configuration

Launching a Sample Workload on Your Cluster

Enhancing Auto Scaling with KEDA and Karpenter Integration

Cleaning Up Resources Post-Implementation

Conclusion: Optimize Your ML Workloads with Auto Scaling

About the Authors

Exciting Update: Amazon SageMaker HyperPod Adds Managed Node Auto Scaling with Karpenter

Today, we’re thrilled to announce that Amazon SageMaker HyperPod now supports managed node automatic scaling with Karpenter! This integration enhances the ability of organizations to efficiently scale their SageMaker HyperPod clusters to meet the dynamic demands of inference and training workloads.

The Need for Auto Scaling in Real-Time Inference

In the world of machine learning, real-time inference workloads are often fraught with unpredictable traffic patterns. Businesses must quickly adapt their GPU compute capacities to maintain service-level agreements (SLAs) without compromising on response times or cost-efficiency. This is where Karpenter shines, allowing automatic scaling based on demand spikes while alleviating the operational burden of self-managed solutions.

What Makes this Feature Stand Out?

This service-managed solution dramatically reduces the complexity of installing, configuring, and maintaining Karpenter controllers, offering a seamless integration with the resilience capabilities of SageMaker HyperPod. One of the standout features is the ability to scale to zero, eliminating the need for dedicated compute resources when they are not in use, thus enhancing cost-efficiency.

An Infrastructure Built for Resilience

SageMaker HyperPod offers a high-performance, resilient infrastructure, complete with observability tools optimized for large-scale model training and deployment. Organizations such as Perplexity, HippocraticAI, H.AI, and Articul8 are already leveraging HyperPod for effective model training and deployment. As more businesses transition from training foundation models (FMs) to running operational inference at scale, the requirement for automatic scaling becomes critical.

Karpenter: A Game Changer

Karpenter is an open-source Kubernetes node lifecycle manager created by AWS, designed to optimize cluster auto scaling. It efficiently addresses the needs of organizations by offering:

Service Managed Lifecycle: Karpenter’s installation, updates, and maintenance are all managed by SageMaker HyperPod.
Just-in-Time Provisioning: Karpenter observes pending pods and provisions required compute resources as needed.
Workload-Aware Node Selection: It chooses optimal instance types based on pod requirements and pricing.
Automatic Node Consolidation: Regularly evaluates cluster status for optimization opportunities.
Integrated Resilience: Utilizes the built-in fault tolerance mechanisms of SageMaker HyperPod.

This managed Karpenter solution is seamlessly integrated into SageMaker HyperPod EKS clusters, evolving static capacity into a dynamic, cost-optimized infrastructure that scales with demand.

Setting Up Automatic Scaling

Prerequisites

To get started, ensure you have the required quotas for the instances you’ll create in the SageMaker HyperPod cluster. Also, create the necessary AWS Identity and Access Management (IAM) permissions for Karpenter.

Creating a SageMaker HyperPod Cluster

Log into the SageMaker AI console and navigate to HyperPod clusters.
Select "Create HyperPod cluster" and orchestrate it on Amazon EKS.
Choose "Custom setup," enter a name, and configure instance recovery and provisioning modes.
Submit your configuration.

Once your cluster is created, update it to enable Karpenter through the Boto3 or AWS CLI commands. Verify the enablement using the DescribeCluster API.

Creating HyperpodNodeClass

This custom resource defines constraints around instance types and availability zones. It maps pre-created instance groups in SageMaker HyperPod, guiding Karpenter in its scaling decisions.

apiVersion: karpenter.sagemaker.amazonaws.com/v1
kind: HyperpodNodeClass
metadata:
  name: multiazg6
spec:
  instanceGroups:
    - auto-g6-az1
    - auto-g6-4xaz2

Apply this configuration to your EKS cluster using kubectl.

Creating NodePool

The NodePool sets constraints on nodes that Karpenter can create. It allows you to define specific labels, taints, and instance types for optimal resource allocation.

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpunodepool
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.sagemaker.amazonaws.com
        kind: HyperpodNodeClass
        name: multiazg6
      requirements:
        - key: "node.kubernetes.io/instance-type"
          operator: In
          values: ["ml.g6.xlarge"]

Launching a Simple Workload

Once your setup is complete, you can run a Kubernetes deployment that scales dynamically according to demand.

Advanced Auto Scaling with KEDA and Karpenter

Combining Kubernetes Event-driven Autoscaling (KEDA) with Karpenter can provide a robust two-tier auto-scaling solution. While KEDA adjusts the number of pods based on various metrics, Karpenter provisions the necessary nodes, ensuring optimal performance and cost-effectiveness.

Conclusion

With the launch of Karpenter node auto scaling on SageMaker HyperPod, machine learning workloads can now dynamically adapt to changing demands, optimizing resource utilization and cost. By enabling Karpenter in your SageMaker HyperPod clusters, you can easily scale your workloads to meet production traffic requirements.

To experience these benefits first-hand, implement Karpenter in your SageMaker HyperPod clusters today!

About the Authors

Vivek Gangasani: Lead GenAI Specialist Solutions Architect focused on optimizing inference performance.
Adam Stanley: Solution Architect at AWS, specialized in machine learning infrastructure.
Kunal Jha: Principal Product Manager at AWS for SageMaker HyperPod.
Ty Bergstrom: Software Engineer involved with HyperPod Clusters platform.

As they continue to innovate, these experts are dedicated to helping enterprises and startups scale their GenAI models effectively.

Exclusive Content:

Introducing Auto Scaling for Amazon SageMaker HyperPod

Introducing Managed Node Auto Scaling for Amazon SageMaker HyperPod with Karpenter

Unlocking Dynamic Scalability for Machine Learning Workloads

New Features and Benefits of Karpenter Integration

Solution Overview

Prerequisites for Implementing Karpenter

Step-by-Step: Create and Configure Your SageMaker HyperPod Cluster

Defining HyperpodNodeClass for Resource Management

Setting Up a NodePool for Optimal Node Configuration

Launching a Sample Workload on Your Cluster

Enhancing Auto Scaling with KEDA and Karpenter Integration

Cleaning Up Resources Post-Implementation

Conclusion: Optimize Your ML Workloads with Auto Scaling

About the Authors

Exciting Update: Amazon SageMaker HyperPod Adds Managed Node Auto Scaling with Karpenter

The Need for Auto Scaling in Real-Time Inference

What Makes this Feature Stand Out?

An Infrastructure Built for Resilience

Karpenter: A Game Changer

Setting Up Automatic Scaling

Prerequisites

Creating a SageMaker HyperPod Cluster

Creating HyperpodNodeClass

Creating NodePool

Launching a Simple Workload

Advanced Auto Scaling with KEDA and Karpenter

Conclusion

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe