Exclusive Content:

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

“Revealing Weak Infosec Practices that Open the Door for Cyber Criminals in Your Organization” • The Register

Warning: Stolen ChatGPT Credentials a Hot Commodity on the...

Train and Deploy AI Models at Trillion-Parameter Scale with Amazon SageMaker HyperPod Support for P6e-GB200 UltraServers

Accelerating AI Innovation with P6e-GB200 UltraServers and SageMaker HyperPod

Unlocking Unprecedented GPU Power and Scalability for Trillion-Parameter AI Models


Inside the UltraServer

Performance Benefits of UltraServers

GPU and Compute Power

High-Performance Networking

Storage and Data Throughput

Topology-Aware Scheduling

Use Cases for UltraServers

Flexible Training Plans for UltraServer Capacity

Create an UltraServer Cluster with SageMaker HyperPod

Conclusion

About the Authors

Unleashing AI Potential: Introducing Amazon SageMaker HyperPod and P6e-GB200 UltraServers

Imagine harnessing the power of 72 cutting-edge NVIDIA Blackwell GPUs in a single system for the next wave of AI innovation. This groundbreaking capability is at the heart of Amazon SageMaker HyperPod, which now supports P6e-GB200 UltraServers. With an astounding capacity to unlock 360 petaflops of dense 8-bit floating point (FP8) compute and 1.4 exaflops of sparse 4-bit floating point (FP4) compute, organizations can now accelerate the development and deployment of trillion-parameter AI models at scale.

In this post, we will delve into the technical specifications of the P6e-GB200 UltraServers, discuss their performance benefits, highlight key use cases, and guide you on how to purchase UltraServer capacity and get started with SageMaker HyperPod.

Inside the UltraServer

The P6e-GB200 UltraServers represent a leap in GPU acceleration. Each server is powered by NVIDIA GB200 NVL72 and connects 36 NVIDIA Grace™ CPUs with 72 Blackwell GPUs within the same NVIDIA NVLink™ domain. The UltraServers come in two configurations:

  • ml.u-p6e-gb200x36: This includes a rack of 9 compute nodes with 36 Blackwell GPUs.
  • ml.u-p6e-gb200x72: This comprises a rack-pair of 18 compute nodes with a total of 72 Blackwell GPUs.

Each compute node contains two NVIDIA Grace Blackwell Superchips, connecting two high-performance NVIDIA GPUs with an Arm-based NVIDIA Grace CPU via the NVLink chip-to-chip (C2C) interconnect. This design boosts intra-server communication and offers enhanced bandwidth for efficient AI model training.

Performance Benefits of UltraServers

GPU and Compute Power

With P6e-GB200 UltraServers, you gain access to remarkable GPU capabilities. The architecture supports up to 72 NVIDIA Blackwell GPUs, providing:

  • 360 petaflops of FP8 compute (without sparsity)
  • 1.4 exaflops of FP4 compute (with sparsity)
  • 13.4 TB of high-bandwidth memory (HBM3e)

This architecture allows for 10 petaflops of dense FP8 compute and 40 petaflops of sparse FP4 compute per Grace Blackwell Superchip. The integration of second-generation Transformer Engine and support for the latest AI precision microscaling data formats ensures that P6e-GB200 UltraServers significantly accelerate inference and training for large language models (LLMs) and Mixture-of-Experts (MoE) models.

High-Performance Networking

UltraServers leverage up to 130 TBps of low-latency NVLink bandwidth for seamless communication between GPUs, achieving double the bandwidth of previous architectures. Each compute node accommodates up to 17 physical network interface cards (NICs), each capable of 400 Gbps bandwidth, facilitating robust communication across the infrastructure.

Storage and Data Throughput

Boasting up to 405 TB of local NVMe SSD storage, P6e-GB200 UltraServers are well-suited for handling large-scale datasets. Amazon FSx for Lustre can be integrated for high-performance shared storage, delivering direct data transfer between the file system and GPU memory with dual TBps throughput, ideal for demanding training and inference tasks.

Topology-Aware Scheduling

Amazon Elastic Compute Cloud (Amazon EC2) enhances AI workloads with topology information, allowing intelligent optimization during distributed training. This capability enables efficient data placement and communication patterns, maximizing model performance.

Use Cases for UltraServers

P6e-GB200 UltraServers can efficiently train AI models with over a trillion parameters due to their unified NVLink domain and exceptional cross-node bandwidth. This setup enables organizations to push the boundaries of AI research and development, significantly enhancing iteration cycles and the quality of AI models.

In real-time trillion-parameter model inference, UltraServers excel, achieving 30x faster inference speeds compared to previous platforms, making them ideal for applications in generative AI, natural language processing, and conversational agents.

Additionally, the infrastructure supports multiple teams working on diverse, distributed training and inference workloads, streamlining project timelines and reducing costs while maximizing resource utilization.

Flexible Training Plans for UltraServer Capacity

Amazon SageMaker AI offers P6e-GB200 UltraServer capacity through flexible training plans available in the Dallas AWS Local Zone (us-east-1-dfw-2a). Users can select from configurations such as ml.u-p6e-gb200x36 or ml.u-p6e-gb200x72.

To start using UltraServers, navigate to the SageMaker AI training plans console, where you can choose an UltraServer compute type that suits your needs. Ensure you configure at least one spare compute node to maintain operational continuity.

Create an UltraServer Cluster with SageMaker HyperPod

Once you have selected an UltraServer training plan, it’s easy to add capacity to your SageMaker HyperPod cluster. SageMaker optimizes the placement of these nodes to ensure excellent data transfer performance by keeping them within the same NVLink domain.

Conclusion

P6e-GB200 UltraServers empower organizations to train, fine-tune, and serve state-of-the-art AI models swiftly and efficiently. By combining high-performance GPU resources, robust networking, and cutting-edge memory with the automation of SageMaker HyperPod, enterprises can enhance every stage of the AI lifecycle—from experimentation and training to seamless deployment.

This powerful solution not only breaks new ground in performance but also reduces operational complexity and costs, enabling innovators to lead the next era of AI advancement.

About the Authors

Nathan Arnold is a Senior AI/ML Specialist Solutions Architect at AWS, helping organizations of all sizes efficiently train and deploy foundation models on AWS. Outside of work, he enjoys hiking, trail running, and time with his dogs.

Latest

50+ Essential Machine Learning Resources for Self-Study in 2026

Unlocking the World of Machine Learning: Essential Resources for...

ChatGPT’s 4% Fee Validates Marketplace Economics

Shopify Merchants to Face 4% Transaction Fee on ChatGPT...

AFF Holiday & Travel Expo, Robotics Conference, and E-Commerce Summit

Upcoming Major Events in Hong Kong: Financial Insights, Travel...

Wealth and Asset Managers Accelerate AI Adoption Driven by ML, NLP, and Generative AI

Subscribe to Our Free Newsletter: Get the Latest Fintech...

Don't miss

Haiper steps out of stealth mode, secures $13.8 million seed funding for video-generative AI

Haiper Emerges from Stealth Mode with $13.8 Million Seed...

Running Your ML Notebook on Databricks: A Step-by-Step Guide

A Step-by-Step Guide to Hosting Machine Learning Notebooks in...

VOXI UK Launches First AI Chatbot to Support Customers

VOXI Launches AI Chatbot to Revolutionize Customer Services in...

Investing in digital infrastructure key to realizing generative AI’s potential for driving economic growth | articles

Challenges Hindering the Widescale Deployment of Generative AI: Legal,...

How PDI Developed a Robust Enterprise-Grade RAG System for AI Applications...

Transforming Enterprise Knowledge Accessibility: The PDIQ Solution Introduction to PDI Technologies Challenges in Knowledge Accessibility Overview of PDI Intelligence Query (PDIQ) Solution Architecture Process Flow Crawlers Handling Images Document Processing Outcomes and Next...

AI That Mimics Human Thinking: How Close Are We? | Aiiot...

Can AI Truly Think Like a Human? Exploring the Boundaries of Machine Intelligence Understanding What "Thinking Like a Human" Means How Current AI Measures Up The Biggest...

Introducing Multimodal Retrieval for Knowledge Bases in Amazon Bedrock

Exciting Announcement: Multimodal Retrieval Now Available for Amazon Bedrock Knowledge Bases Unlocking New Possibilities with Native Support for Video and Audio Content Streamlining AI Applications Across...