Revolutionizing AI Infrastructure with the Launch of P6e-GB200 UltraServers

Accelerating Innovation in AI Workloads

Meeting the Expanding Compute Demands of AI

Innovation Built on AWS Core Strengths

Robust Instance Security and Stability

Reliable Performance at Massive Scale

Infrastructure Efficiency

Getting Started with NVIDIA Blackwell on AWS

Amazon SageMaker HyperPod

Amazon EKS

NVIDIA DGX Cloud on AWS

The Road Ahead: Empowering AI Possibilities

Resources

About the Author

Accelerating Innovation with P6e-GB200 UltraServers: A New Era in AI Computing

Imagine a world where complex problems are tackled seamlessly, drawing upon extensive datasets ranging from scientific research to business documentation. This isn’t a futuristic dream; it’s happening right now in AI production environments across various sectors. Today, businesses in drug discovery, enterprise search, software development, and more are leveraging advanced AI systems to solve intricate challenges. As AI continues to evolve, the tools that support it must evolve as well. This is where the P6e-GB200 UltraServers come into play.

Transforming AI Workloads

We’re thrilled to announce the general availability of the P6e-GB200 UltraServers, powered by NVIDIA Grace Blackwell Superchips. These servers are designed explicitly for the training and deployment of sophisticated AI models that push the boundaries of what’s possible. Earlier this year, we introduced the P6-B200 instances, which are suited for a variety of AI and high-performance computing requirements.

Unprecedented Compute Power

The P6e-GB200 UltraServers stand as our most powerful GPU solution to date, integrating up to 72 NVIDIA Blackwell GPUs linked via fifth-generation NVIDIA NVLink. Together, they function as a unified computational unit, achieving a staggering 360 petaflops of dense FP8 compute and an impressive 13.4 TB of high bandwidth GPU memory (HBM3e).

This exceeds the capabilities of previous P5en instances by over 20 times in compute capacity and 11 times in memory. Furthermore, the UltraServers support up to 28.8 Tbps of aggregate bandwidth with fourth-generation Elastic Fabric Adapter (EFAv4) networking.

Choosing the Right Instance for Your Needs

When deciding between the P6e-GB200 and P6-B200, consider the specific requirements of your workload.

P6e-GB200 UltraServers are optimal for high-compute, memory-intensive tasks like training trillion-parameter models. Their NVIDIA GB200 NVL72 architecture minimizes communication overhead, enabling efficient distributed training and faster inference times.
P6-B200 instances, on the other hand, provide a versatile solution for medium to large-scale training, with a familiar 8-GPU configuration that eases transitions from existing GPU workloads, especially for x86 environments.

Built on AWS Core Strengths

Integrating NVIDIA Blackwell into AWS isn’t just a one-time achievement; it represents continuous innovation across various layers of infrastructure. Our commitment to providing secure and stable GPU workloads is paramount. The specialized hardware and firmware of the AWS Nitro System enforce strict restrictions to safeguard your data.

Robust Security and Stability

AWS places high importance on instance security and stability, crucial for maintaining operational integrity in cloud-based AI workloads. The Nitro System allows for live updates without downtime, ensuring that production timelines remain unaffected.

Performance and Efficiency

To meet the growing demands of AI infrastructure, we’ve deployed P6e-GB200 UltraServers within third-generation EC2 UltraClusters. These clusters not only improve power efficiency by up to 40% but also dramatically reduce cabling requirements, minimizing potential failure points.

Getting Started with NVIDIA Blackwell on AWS

Launching into this advanced computing landscape has never been easier. We provide multiple avenues to seamlessly transition to using P6e-GB200 UltraServers and P6-B200 instances.

Amazon SageMaker HyperPod

If you’re focused on efficiency in AI development, Amazon SageMaker HyperPod offers managed infrastructure that automatically handles large GPU clusters. This service comes with optimizations tailored for both P6e-GB200 and P6-B200 instances, maximizing performance while providing essential monitoring and recovery systems.

Amazon EKS

For organizations that prefer managing infrastructure via Kubernetes, Amazon Elastic Kubernetes Service (EKS) enables you to manage both on-premises and EC2 GPUs in a single cluster, offering unparalleled flexibility for large-scale workloads.

NVIDIA DGX Cloud on AWS

For those utilizing the complete NVIDIA software suite, P6e-GB200 UltraServers will be available through NVIDIA DGX Cloud. This platform optimizes AI workflows at every layer, providing a unified experience backed by NVIDIA’s extensive expertise.

A Forward-Looking Vision

The launch of the P6e-GB200 UltraServers marks a significant milestone, but it is just the beginning. As AI capabilities continue to evolve, so too must the infrastructure that supports them. We look forward to witnessing the innovative solutions that organizations will create using this powerful, scalable technology.

Resources

Explore the resources available on AWS to get started with your AI initiatives and discover the possibilities that lie ahead.

About the Author

David Brown is the Vice President of AWS Compute and Machine Learning Services, responsible for a range of services utilized by customers globally. With a strong background in software development and a passion for advancing AI technologies, David is dedicated to pushing the frontiers of innovation in cloud computing and machine learning.

Exclusive Content:

AWS AI Infrastructure with NVIDIA Blackwell: Two Robust Compute Solutions for the Future of AI

Revolutionizing AI Infrastructure with the Launch of P6e-GB200 UltraServers

Accelerating Innovation in AI Workloads

Meeting the Expanding Compute Demands of AI

Innovation Built on AWS Core Strengths

Robust Instance Security and Stability

Reliable Performance at Massive Scale

Infrastructure Efficiency

Getting Started with NVIDIA Blackwell on AWS

Amazon SageMaker HyperPod

Amazon EKS

NVIDIA DGX Cloud on AWS

The Road Ahead: Empowering AI Possibilities

Resources

About the Author

Accelerating Innovation with P6e-GB200 UltraServers: A New Era in AI Computing

Transforming AI Workloads

Unprecedented Compute Power

Choosing the Right Instance for Your Needs

Built on AWS Core Strengths

Robust Security and Stability

Performance and Efficiency

Getting Started with NVIDIA Blackwell on AWS

Amazon SageMaker HyperPod

Amazon EKS

NVIDIA DGX Cloud on AWS

A Forward-Looking Vision

Resources

About the Author

Latest

Don't miss

Popular categories

Most recent

Most popular

Subscribe