Why Amazon EC2 UltraServers?
Amazon Elastic Compute Cloud (Amazon EC2) UltraServers are ideal for customers seeking the highest AI training and inference performance for models at the trillion-parameter scale. UltraServers connect multiple EC2 instances using a dedicated, high-bandwidth, low-latency accelerator interconnect enabling you to leverage a tightly-coupled mesh of accelerators across EC2 instances, and access significantly more compute and memory than standalone EC2 instances.
EC2 UltraServers are ideal for the largest models that require more memory and more memory bandwidth than standalone EC2 instances can provide. The UltraServer design uses the intra-instance accelerator connectivity to connect multiple instances into one node, unlocking new capabilities. For inference, UltraServers help deliver industry-leading response time to create the best real-time experiences. For training, UltraServers boost model training speed and efficiency with faster collective communication for model parallelism as compared to standalone instances. EC2 UltraServers support EFA networking and when deployed in EC2 UltraClusters enable scale-out distributed training across tens of thousands of accelerators on a single petabit scale, non-blocking network. By delivering higher performance for both training and inference, UltraServers accelerate your time to market and help you deliver real-time applications powered by the most performant, next-generation foundation models.
Benefits
Features
Instances supported
Trn2 instances
Powered by AWS Trainium2 chips, Trn2 instances in a Trn2 UltraServer configuration (available in preview) enable you to scale up to 64 Trainium2 chips connected with NeuronLink, the dedicated high- bandwidth, low-latency interconnect for AWS AI chips. Trn2 UltraServers provide breakthrough performance in Amazon EC2 for generative AI training and inference.
P6e-GB200 Instances
Accelerated by NVIDIA GB200 NVL72, P6e-GB200 instances in an UltraServer configuration allow you to access up to 72 Blackwell GPUs within one NVLink domain to leverage 360 petaflops of FP8 compute (without sparsity), 13.4 TB of total high bandwidth memory (HBM3e), and up to 28.8 terabits per second of Elastic Fabric Adapter (EFAv4) networking. P6e-GB200 instances are only available in UltraServers ranging from 8 GPUs to 72 GPUs.