Amazon EC2 UltraServers

AI training and inference at scale

Why Amazon EC2 UltraServers?

Amazon Elastic Compute Cloud (Amazon EC2) UltraServers are ideal for customers seeking the highest AI training and inference performance for models at the trillion-parameter scale. UltraServers connect multiple EC2 instances using a dedicated, high-bandwidth, low-latency accelerator interconnect enabling you to leverage a tightly-coupled mesh of accelerators across EC2 instances, and access significantly more compute and memory than standalone EC2 instances.

EC2 UltraServers are ideal for the largest models that require more memory and more memory bandwidth than standalone EC2 instances can provide. The UltraServer design uses the intra-instance accelerator connectivity to connect multiple instances into one node, unlocking new capabilities. For inference, UltraServers help deliver industry-leading response time to create the best real-time experiences. For training, UltraServers boost model training speed and efficiency with faster collective communication for model parallelism as compared to standalone instances. EC2 UltraServers support EFA networking and when deployed in EC2 UltraClusters enable scale-out distributed training across tens of thousands of accelerators on a single petabit scale, non-blocking network. By delivering higher performance for both training and inference, UltraServers accelerate your time to market and help you deliver real-time applications powered by the most performant, next-generation foundation models.

Benefits

Train and deploy models at the trillion+ parameter scale

UltraServers enable efficient training and inference of models with hundreds of billions to trillions of parameters by linking a larger set of accelerators with a high-bandwidth, low-latency interconnect to deliver more compute and memory than standalone EC2 instances.

Reduce inference latency for real-time applications

UltraServers enable real-time inference for ultra-large models that demand substantial memory and memory bandwidth resources beyond what a single EC2 instance can offer.

Reduce time to train by extending model parallelism to more accelerators

UltraServers enable faster collective communication for model parallelism as compared to standalone instances, helping you reduce your time to train.

Features

Dedicated, high-bandwidth, and low-latency accelerator interconnect

You can launch instances into an UltraServer and leverage a dedicated, high-bandwidth, and low-latency accelerator interconnect across these instances. UltraServers enable access to a larger number of accelerators connected with this dedicated interconnect, delivering significantly more compute and memory in a single node than standalone EC2 instances.

High-performance networking

EC2 UltraServers deployed in EC2 UltraClusters are interconnected with petabit-scale EFA networking to improve performance for distributed training workloads.

High-performance storage

You can use EC2 UltraServers together with high-performance storage solutions such as Amazon FSx for Lustre, fully managed shared storage built on the most popular high-performance parallel file system. You can also use virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).

Built on the Nitro system

EC2 UltraServers are built on the AWS Nitro System, a rich collection of building blocks that offloads many of the traditional virtualization functions to dedicated hardware and software. Nitro delivers high performance, high availability, and high security, reducing virtualization overhead.

Instances supported

Trn2 instances

Powered by AWS Trainium2 chips, Trn2 instances in a Trn2 UltraServer configuration (available in preview) enable you to scale up to 64 Trainium2 chips connected with NeuronLink, the dedicated high- bandwidth, low-latency interconnect for AWS AI chips. Trn2 UltraServers provide breakthrough performance in Amazon EC2 for generative AI training and inference.

Learn more

P6e-GB200 Instances

Accelerated by NVIDIA GB200 NVL72, P6e-GB200 instances in an UltraServer configuration allow you to access up to 72 Blackwell GPUs within one NVLink domain to leverage 360 petaflops of FP8 compute (without sparsity), 13.4 TB of total high bandwidth memory (HBM3e), and up to 28.8 terabits per second of Elastic Fabric Adapter (EFAv4) networking. P6e-GB200 instances are only available in UltraServers ranging from 8 GPUs to 72 GPUs.