NVIDIA DGX-2

NVIDIA DGX-2 is the world’s most powerful hardware for the most complex AI challenges and also complex software stack including all the most used enviroment (TensorFlow, Caffe, Torch, Theano, …). The NVIDIA DGX-2 is an artificial intelligence supercomputer, the first 2 petaFLOPS system that combines 16 fully interconnected GPUs for 10X the deep learning performance. The latest addition to the DGX family of systems is the DGX-2H – DGX-2, tuned to achieve the highest performance.

Hardware

Let’s take a look at NVIDIA DGX-2 and DGX-2H in more detail, first from a hardware standpoint.

ParameterDGX-2HDGX-2
GPUs16× NVIDIA Tesla V100 32GB16× NVIDIA Tesla V100 32GB
Performance (tensor operace)2 .1 PetaFLOPS2 PetaFLOPS
GPU memory512 GB total512 GB total
CPU2× Platinum 8174, 3.1 GHz (24 cores)2× Platinum 8168, 2.7 GHz (24 cores)
NVIDIA CUDA cores81 92081 920
NVIDIA Tensor cores10 24010 240
RAM1,5 TB1,5 TB
HDD2× 960GB NVMe SSD, 8× 3.84TB NVMe SSD2× 960GB NVMe SSD, 8× 3.84TB NVMe SSD
Network2× 10/25Gb Ethernet, 8× 100Gb Infiniband/Ethernet2× 10/25Gb Ethernet, 8× 100Gb Infiniband/Ethernet
Maximum input power12 kW10 kW
Typerack, 10Urack, 10U

With DGX-2, model complexity and size are no longer constrained by the limits of traditional architectures. Now, you can take advantage of model-parallel training with the NVIDIA NVSwitch networking fabric. It’s the innovative technology behind the world’s first 2-petaFLOPS GPU accelerator with 2.4 TB/s of bisection bandwidth, delivering a 24X increase over prior generations.

Software equipment

But what is more interesting is the already mentioned software package offered with NVIDIA machines. NVIDIA GPU Cloud provides easy access to a comprehensive catalog of GPU-optimized software. It features performance-engineered containers with all the top deep learning frameworks such as TensorFlow, PyTorch, MXNet, and more, tuned, tested, certified, and maintained by NVIDIA. It also includes third-party managed containers for HPC applications, and NVIDIA containers for HPC visualization. NVIDIE provides 30% more performance for machine learning applications against applications deployed purely on NVIDIA hardware. The main advantage of the pre-installed environment is the deployment speed, which is in units of hours.

NVIDIA DGX systems SW stack

NVIDIA DGX systems SW stack

Support

The strength of the NVIDIA solution is to support the entire system. Hardware support (in case of failure of any of the components) is a matter of course. Software support for the entire environment is critical if something does not work. There is hundreds of developers ready to help. Support is part of NVIDIA DGX purchase. It is available for 1 or 3 years and can be further extended after this time.

With a combination of tuned hardware, software and NVIDIA support, NVIDIA DGX delivers significantly higher performance and acceleration in the learning phase of machine learning applications:

The difference between DGX’s fast-paced, fast and powerful machine learning solution and DIY (Do It Yourself) is evident from the following video: