Accelerating AI Innovation with P6e-GB200 UltraServers and SageMaker HyperPod

Unlocking Unprecedented GPU Power and Scalability for Trillion-Parameter AI Models

Inside the UltraServer

Performance Benefits of UltraServers

GPU and Compute Power

High-Performance Networking

Storage and Data Throughput

Topology-Aware Scheduling

Use Cases for UltraServers

Flexible Training Plans for UltraServer Capacity

Create an UltraServer Cluster with SageMaker HyperPod

Conclusion

About the Authors

Unleashing AI Potential: Introducing Amazon SageMaker HyperPod and P6e-GB200 UltraServers

Imagine harnessing the power of 72 cutting-edge NVIDIA Blackwell GPUs in a single system for the next wave of AI innovation. This groundbreaking capability is at the heart of Amazon SageMaker HyperPod, which now supports P6e-GB200 UltraServers. With an astounding capacity to unlock 360 petaflops of dense 8-bit floating point (FP8) compute and 1.4 exaflops of sparse 4-bit floating point (FP4) compute, organizations can now accelerate the development and deployment of trillion-parameter AI models at scale.

In this post, we will delve into the technical specifications of the P6e-GB200 UltraServers, discuss their performance benefits, highlight key use cases, and guide you on how to purchase UltraServer capacity and get started with SageMaker HyperPod.

Inside the UltraServer

The P6e-GB200 UltraServers represent a leap in GPU acceleration. Each server is powered by NVIDIA GB200 NVL72 and connects 36 NVIDIA Grace™ CPUs with 72 Blackwell GPUs within the same NVIDIA NVLink™ domain. The UltraServers come in two configurations:

ml.u-p6e-gb200x36: This includes a rack of 9 compute nodes with 36 Blackwell GPUs.
ml.u-p6e-gb200x72: This comprises a rack-pair of 18 compute nodes with a total of 72 Blackwell GPUs.

Each compute node contains two NVIDIA Grace Blackwell Superchips, connecting two high-performance NVIDIA GPUs with an Arm-based NVIDIA Grace CPU via the NVLink chip-to-chip (C2C) interconnect. This design boosts intra-server communication and offers enhanced bandwidth for efficient AI model training.

Performance Benefits of UltraServers

GPU and Compute Power

With P6e-GB200 UltraServers, you gain access to remarkable GPU capabilities. The architecture supports up to 72 NVIDIA Blackwell GPUs, providing:

360 petaflops of FP8 compute (without sparsity)
1.4 exaflops of FP4 compute (with sparsity)
13.4 TB of high-bandwidth memory (HBM3e)

This architecture allows for 10 petaflops of dense FP8 compute and 40 petaflops of sparse FP4 compute per Grace Blackwell Superchip. The integration of second-generation Transformer Engine and support for the latest AI precision microscaling data formats ensures that P6e-GB200 UltraServers significantly accelerate inference and training for large language models (LLMs) and Mixture-of-Experts (MoE) models.

High-Performance Networking

UltraServers leverage up to 130 TBps of low-latency NVLink bandwidth for seamless communication between GPUs, achieving double the bandwidth of previous architectures. Each compute node accommodates up to 17 physical network interface cards (NICs), each capable of 400 Gbps bandwidth, facilitating robust communication across the infrastructure.

Storage and Data Throughput

Boasting up to 405 TB of local NVMe SSD storage, P6e-GB200 UltraServers are well-suited for handling large-scale datasets. Amazon FSx for Lustre can be integrated for high-performance shared storage, delivering direct data transfer between the file system and GPU memory with dual TBps throughput, ideal for demanding training and inference tasks.

Topology-Aware Scheduling

Amazon Elastic Compute Cloud (Amazon EC2) enhances AI workloads with topology information, allowing intelligent optimization during distributed training. This capability enables efficient data placement and communication patterns, maximizing model performance.

Use Cases for UltraServers

P6e-GB200 UltraServers can efficiently train AI models with over a trillion parameters due to their unified NVLink domain and exceptional cross-node bandwidth. This setup enables organizations to push the boundaries of AI research and development, significantly enhancing iteration cycles and the quality of AI models.

In real-time trillion-parameter model inference, UltraServers excel, achieving 30x faster inference speeds compared to previous platforms, making them ideal for applications in generative AI, natural language processing, and conversational agents.

Additionally, the infrastructure supports multiple teams working on diverse, distributed training and inference workloads, streamlining project timelines and reducing costs while maximizing resource utilization.

Flexible Training Plans for UltraServer Capacity

Amazon SageMaker AI offers P6e-GB200 UltraServer capacity through flexible training plans available in the Dallas AWS Local Zone (us-east-1-dfw-2a). Users can select from configurations such as ml.u-p6e-gb200x36 or ml.u-p6e-gb200x72.

To start using UltraServers, navigate to the SageMaker AI training plans console, where you can choose an UltraServer compute type that suits your needs. Ensure you configure at least one spare compute node to maintain operational continuity.

Create an UltraServer Cluster with SageMaker HyperPod

Once you have selected an UltraServer training plan, it’s easy to add capacity to your SageMaker HyperPod cluster. SageMaker optimizes the placement of these nodes to ensure excellent data transfer performance by keeping them within the same NVLink domain.

Conclusion

P6e-GB200 UltraServers empower organizations to train, fine-tune, and serve state-of-the-art AI models swiftly and efficiently. By combining high-performance GPU resources, robust networking, and cutting-edge memory with the automation of SageMaker HyperPod, enterprises can enhance every stage of the AI lifecycle—from experimentation and training to seamless deployment.

This powerful solution not only breaks new ground in performance but also reduces operational complexity and costs, enabling innovators to lead the next era of AI advancement.

About the Authors

Nathan Arnold is a Senior AI/ML Specialist Solutions Architect at AWS, helping organizations of all sizes efficiently train and deploy foundation models on AWS. Outside of work, he enjoys hiking, trail running, and time with his dogs.

Exclusive Content:

Train and Deploy AI Models at Trillion-Parameter Scale with Amazon SageMaker HyperPod Support for P6e-GB200 UltraServers

Accelerating AI Innovation with P6e-GB200 UltraServers and SageMaker HyperPod

Unlocking Unprecedented GPU Power and Scalability for Trillion-Parameter AI Models

Inside the UltraServer

Performance Benefits of UltraServers

GPU and Compute Power

High-Performance Networking

Storage and Data Throughput

Topology-Aware Scheduling

Use Cases for UltraServers

Flexible Training Plans for UltraServer Capacity

Create an UltraServer Cluster with SageMaker HyperPod

Conclusion

About the Authors

Unleashing AI Potential: Introducing Amazon SageMaker HyperPod and P6e-GB200 UltraServers

Inside the UltraServer

Performance Benefits of UltraServers

GPU and Compute Power

High-Performance Networking

Storage and Data Throughput

Topology-Aware Scheduling

Use Cases for UltraServers

Flexible Training Plans for UltraServer Capacity

Create an UltraServer Cluster with SageMaker HyperPod

Conclusion

About the Authors

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe