NVIDIA’s Next-Gen GH200 Grace Hopper GPU-CPU Superchip Platform Revealed – For Generative AI and Accelerated Computing

NVIDIA recently unveiled its next-generation GH200 Grace Hopper, a GPU-CPU superchip platform featuring the world’s first HBM3e processor. It features up to 72 CPU cores and 528 GPU Tensor cores. The CPU architecture is based on ARM Neoverse V2, while the GPU is Hopper. The said platform is said to offer groundbreaking memory and bandwidth. It also has the ability to connect multiple GPUs and features an easily scalable server design. Check out more details from NVIDIA’s white paper (except) below and its official news release.

NVIDIA Unveils Next-Generation GH200 Grace Hopper Superchip Platform for Era of Accelerated Computing and Generative AI

NVIDIA today announced the next-generation NVIDIA GH200 Grace Hopper™ platform — based on a new Grace Hopper Superchip with the world’s first HBM3e processor — built for the era of accelerated computing and generative AI.

Created to handle the world’s most complex generative AI workloads, spanning large language models, recommender systems and vector databases, the new platform will be available in a wide range of configurations.

The dual configuration — which delivers up to 3.5x more memory capacity and 3x more bandwidth than the current generation offering — comprises a single server with 144 Arm Neoverse cores, eight petaflops of AI performance and 282GB of the latest HBM3e memory technology.

Jensen Huang, founder and CEO of NVIDIA:
“To meet surging demand for generative AI, data centers require accelerated computing platforms with specialized needs,”. “The new GH200 Grace Hopper Superchip platform delivers this with exceptional memory technology and bandwidth to improve throughput, the ability to connect GPUs to aggregate performance without compromise, and a server design that can be easily deployed across the entire data center.”

The new platform uses the Grace Hopper Superchip, which can be connected with additional Superchips by NVIDIA NVLink™, allowing them to work together to deploy the giant models used for generative AI. This high-speed, coherent technology gives the GPU full access to the CPU memory, providing a combined 1.2TB of fast memory when in dual configuration.

HBM3e memory, which is 50% faster than current HBM3, delivers a total of 10TB/sec of combined bandwidth, allowing the new platform to run models 3.5x larger than the previous version, while improving performance with 3x faster memory bandwidth.

Growing Demand for Grace Hopper

Leading manufacturers are already offering systems based on the previously announced Grace Hopper Superchip. To drive broad adoption of the technology, the next-generation Grace Hopper Superchip platform with HBM3e is fully compatible with the NVIDIA MGX™ server specification unveiled at COMPUTEX earlier this year. With MGX, any system manufacturer can quickly and cost-effectively add Grace Hopper into over 100 server variations.

NVIDIA GH200 Grace Hopper Overview

NVIDIA Grace CPU is the first NVIDIA data center CPU, and it is built from the ground up to create HPC and AI superchips. The NVIDIA Grace CPU uses 72 Arm Neoverse V2 CPU cores to deliver leading per-thread performance, while providing higher energy efficiency than traditional CPUs. Up to 480 GB of LPDDR5X memory provides the optimal balance between memory capacity, energy efficiency, and performance with up to 500 GB/s of memory bandwidth per CPU. Its Scalable Coherency Fabric provides up to 3.2 TB/s of total bisection bandwidth to realize the full performance of CPU cores, memory, system IOs, and NVLink-C2C.

NVIDIA Hopper is the ninth-generation NVIDIA data center GPU and is designed to deliver order-of-magnitude improvements for large-scale AI and HPC applications compared to previous NVIDIA Ampere GPU generations. Thread Block Clusters and Thread Block Reconfiguration improve spatial and temporal data locality, and together with new Asynchronous Execution engines, enable applications to always keep all units
busy.

NVIDIA GH200 fuses an NVIDIA Grace CPU and an NVIDIA Hopper GPU into a single superchip via NVIDIA NVLink-C2C, a 900 GB/s total bandwidth chip-to-chip interconnect. NVLink-C2C memory coherency enables programming of both the Grace CPU Superchip and the Grace Hopper Superchip with a unified programming model.

NVIDIA GH200 Platforms

Heterogeneous GPU-CPU Platforms for AI, Data Analytics, and HPC
The GH200 Grace Hopper Superchip forms the basis for many different server designs that serve diverse needs in Machine Learning and HPC. NVIDIA has developed two platforms that address diverse customer needs.

  • NVIDIA MGX GH200 is ideal for scale-out of accelerated solutions including but not limited to traditional machine learning (ML), AI, data analytics, accelerated databases, and HPC workloads. With up to 576 GB of fast memory, a single node can run a variety of workloads and when combined with NVIDIA Networking solutions (Connect-X7, Spectrum-X, and BlueField-3 DPUs), this platform is easy to manage and deploy, and uses a traditional HPC/AI cluster networking architecture.
  • NVIDIA DGX GH200 enables all GPU threads in the NVLink-connected domain to address up to 144 TB of memory at up to 900 GB/s total bandwidth per superchip, up to 450 GB/s all-reduce bandwidth, and up to 115.2 TB/s bisection bandwidth, in a 256 GPU NVLink connected system making this platform ideal for strong scaling the world’s largest and most challenging AI training and HPC workloads.

Pricing and Availability

Leading system manufacturers are expected to deliver systems based on the platform in Q2 of 2024. No word regarding its pricing yet.

Photo of author
Author
Peter Paul
Peter is a PC enthusiast and avid gamer with several years of hands-on experience in testing and reviewing PC components, audio equipment, and various tech devices. He offers a genuine, no-nonsense perspective, helping consumers make informed choices in the ever-changing world of technology.

Leave a Comment