
NVIDIA DGX A100
Essential instrument of AI research from NVIDIA designed for data centers.
NVIDIA DGX Station
The first personal supercomputer for machine learning and artificial intelligence designed for your office.
Hardware
Let’s take a look at the NVIDIA DGX in detail, first from hardware point of view.
Parameter | NVIDIA DGX-A100 | NVIDIA DGX Station |
---|---|---|
GPUs | 8x NVIDIA Ampere A100 Tensor Core GPUs | 4× NVIDIA Tesla V100 32GB |
Performance | 5 petaFLOPS AI 10 petaOPS INT8 | 0,5 PetaFLOPS |
GPU memory | 320 GB in total | 128 GB in total |
CPU | Dual AMD Rome 7742, 128 cores total, 2.25 GHz (base), 3.4 GHz (max boost) | E5-2698 v4 2.2GHz (20 cores) |
NVIDIA CUDA cores | 55 296 | 20 480 |
NVIDIA Tensor cores | 3 456 | 2 560 |
Multi-instance GPU | 56 instances | 4 instances |
GPU interconnect | 6x NVIDIA NVSwitch 3, non-blocking, 4.8 TB/s | NVLink |
RAM | 1 TB | 256 GB |
HDD | OS: 2x 1.92TB M.2 NVME drives Internal Storage: 15TB (4x 3.84TB) U.2 NVME drives | 4× 1,92TB SSD |
Network | 8x Single-Port Mellanox ConnectX-6 VPI 200Gb/s HDR InfiniBand 1x Dual-Port Mellanox ConnectX-6 VPI 10/25/50/100/200Gb/s Ethernet | 2× 10GbE |
Power consumption | 6 500 W | 1 500 W |
Case | rack, 6U | tower, watter cooling of GPU, CPU |
All NVIDIA DGX systems feature the latest and fastest accelerators today — NVIDIA Tesla V100 32GB. DGX Station contains four cards, DGX A100 even eight accelerators! Main benefits of NVIDIA Tesla cards are specialized Tensor cores for accelerating machine learning applications or large memory (40 GB for each card) secured by ECC technology. NVIDIA Tesla cards are also equipped by interface for high bandwidth card communication — NVLink. NVLink can reach speed up to 600 GB/s. NVIDIA DGX A100 additionally offers super powerful NVSwitch that connects eight NVIDIA Ampere A100 with 4.8 TB/s bisectional bandwidth in non-blocking architecture.
How to leverage Multi-GPU systems?
This is the one of the most often question we can hear from our customers. There are couple of techniques you can use as described in webinar to Multi-GPU topic. Or you can attend Fundamentals of Deep Learning for Multi-GPUs workshop organized by us together with NVIDIA Deep Learning Institute (DLI).
Software
What is more interesting, however, is the already mentioned software package offered by NVIDIA DGX machines. All of these offer pre-installed and performance-tuned environments for machine learning (e.g. Caffe, resp. Caffe 2, Theano, TensorFlow, PyTorch, nebo MXNet) or an intuitive environment for data analysts (NVIDIA Digits). All of this is elegantly packed in Docker Containers. Such a tuned environment provides 30% more power for machine learning applications against applications deployed purely on NVIDIA hardware. The main advantage of the pre-installed environment is the deployment speed, which is in units of hours. The base DGX system image contains Ubuntu operating system, NVIDIA GPU drivers and Docker environment for application containers downloadable from NVIDIA GPU Cloudu (NGC). NVIDIA also supports to run these Docker images in Singularity environment.
NVIDIA DGX systems SW stack
NVIDIA GPU Cloud
NVIDIA GPU Cloud (NGC) represents repository of the most used frameworks for machine learning and deep learning applications, HPC applications, or NVIDIA GPU cards accelerated visualization. Deploying these applications is a question minutes — copying a link of the appropriate Docker image from NGC repositry, moving it on the DGX system, and downloading and running the Docker container. Content of Docker images — versions of all the libraries and frameworks or setting environment parameters — is updated and optimized by NVIDIA specialists for deployment on DGX systems. https://ngc.nvidia.com/
Support
The strength of the NVIDIA solution is to support the entire system. Hardware support (in case of failure of any of the components) is a matter of course. Software support for the entire environment is critical if something does not work. The customer has hundreds of developers ready to help. Support is part of NVIDIA DGX purchase. It is available for 1, 3 or 5 years and can be further extended after this time.
NVIDIA support includes:
- Access to NVIDIA GPU Cloud (NGC) Portal
- NVIDIA Cloud Management
- DGX Software Upgrades
- DGX Software Updates
- DGX Firmware Updates
- Hardware Support
- Hardware SLA (replacement parts) 1 day
- Software support — DGX OS image and full AI software including ML frameworks
- Enterprise Support Portal
- 24×7 Phone Support
- Access to NVIDIA Knowledgebase
You can also use consultancy with specialists from M Computers in Czech, Slovak and English.
With a combination of tuned hardware, software and NVIDIA support, NVIDIA DGX systems deliver significantly higher performance and acceleration in learning phase.
NVIDIA DGX Systems deliver much better performance for data analytics and training of AI algorithms thanks to combination of tuned hardware, rich software stack and high quality of NVIDIA support that covers both DGX hardware and software.
The difference between tuned DGX system solution for fas and powerful machine learning and DIY variant (Do It Yourself) is evident from the following video:
We delivered our first Nvidia DGX-2 system to IT4Innovations Supercomputing Center VSB in Ostrava, Czech Republic. You can see the details of the installation in the short reference video.
Reference Architectures
NVIDIA DGX systems represents really huge computing power. When designing an architecture, it is necessary to consider their involvement in the overall IT infrastructure and its tuning to achieve maximum performance. NVIDIA has introduced NVDIA DGX POD Reference Architecture, including networking and storage disc arrays. Here you will find individual design proposals from the key storage vendors that describe the overall infrastructure solution for running ML and AI applications. We are the most experienced with ONTAP AI solution with NetApp storage.
ONTAP AI reference architecture with NetApp storage
NVIDIA Deep Learning Institute
The NVIDIA Deep Learning Institute (DLI) offers both online and hands-on trainings for developers, data scientists, and researchers looking to solve challenging problems with deep learning and accelerated computing. More info.
GTC 2020
In March 2020, the premier GPU Technology Conference (GTC) will take place in San Jose. You were able to get acquainted with NVIDIA DGX systems and examples of their deployment and listen to the visionary lectures of the most important people in the fields of artificial intelligence and machine learning, including the charismatic CEO of NVIDIA Jensen Huang.
NVIDIA offers special programs for DGX systems and Tesla accelerators for EDU organizations or start-up companies. We are pleased to provide you information about current promotions.
Testing
We have the NVIDIA DGX Station available to test performance and speed of deployment of ML and AI applications. Thanks to the NVIDIA Tesla Test Drive program we have also 2 x NVIDIA Tesla V100 and 2× NVIDIA T4 available for testing. If you are interested please fill out this form.

Kamila Jeřábková
M: 734 161 516
kamila.jerabkova@mcomputers.cz