NVIDIA DGX-1 超級電腦與人工智慧及深度學習

TAIPEI | SEP. 21-22, 2016

Eric Kang 康勝閔, Sep. 21 2016

GPU Computing

NVIDIAComputing for the Most Demanding Users

Computing Human Imagination

Computing Human Intelligence

DEEP LEARNING EVERYWHERE

INTERNET & CLOUD

Image ClassificationSpeech Recognition

Language TranslationLanguage ProcessingSentiment AnalysisRecommendation

MEDIA & ENTERTAINMENT

Video CaptioningVideo Search

Real Time Translation

AUTONOMOUS MACHINES

Pedestrian DetectionLane Tracking

Recognize Traffic Sign

SECURITY & DEFENSE

Face DetectionVideo SurveillanceSatellite Imagery

MEDICINE & BIOLOGY

Cancer Cell DetectionDiabetic GradingDrug Discovery

DEEP LEARNING APPROACH

Deploy:

Honey badger

Errors

DogCat

Raccoon

Train:

72%74%

2010 2011 2012 2013 2014 2015

“SUPERHUMAN” RESULTSSPARK HYPERSCALE

ADOPTION

Deep Learning

ImageNet — Accuracy %

Cloud Services with AI Powered by NVIDIA

Alibaba/Aliyun Amazon Baidu eBay Facebook

Flickr Google iFLYTEK iQIYI JD.com

Orange Periscope Pinterest Qihoo 360 Shazam

Skype Sogou Twitter Yahoo Supermarket Yandex YelpHand-coded CV

74%76%

6Source: IDC Worldwide Big Data and Analytics 2016 Predictions, November 2015. IDC FutureScape: Worldwide Digital Strategy Consulting 2016 Predictions, Nov 2015;

“By 2020, 80% of Big Data and Analytics deployments will need distributed micro analytics and 40% of all business analytics software will incorporate prescriptive analytics built on cognitive computing functionality. Both of these trends require a dramatic increase in processing power that could be enabled by GPUs.”

— IDC

“By 2018, over 50% of developer teams will embed cognitive services in their apps (vs 1% today) providing U.S. enterprises with over $60 billion annual savings by 2020.”

— IDC

AI — THE NEXT TRILLION $ IT OPPORTUNITY

Deep Learning is a massive opportunity

Data Scientist productivity is vital

NVIDIA is the choice of the deep learning world

DGX-1 is fast, instantly productive

NVIDIA DGX-1The Essential Tool of

Deep Learning Scientists

TESLA P100 WITH NVLINKNew GPU Architecture to Enable the World’s Fastest Compute Node

Pascal Architecture NVLink CoWoS HBM2 Page Migration EnginePCIe

SwitchPCIe

Switch

CPU CPU

Highest Compute Performance GPU Interconnect for Maximum Scalability

Unifying Compute & Memory in Single Package

Simple Parallel Programming with Virtually Unlimited Memory

Unified Memory

Tesla P100

Engineered for deep learning | 170TF FP16 | 8x Tesla P100

NVLink hybrid cube mesh | Accelerates major AI frameworks

NVIDIA DGX-1WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER

NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning

COMPUTER VISION SPEECH AND AUDIO BEHAVIORObject Detection Voice Recognition Translation

Recommendation Engines Sentiment Analysis

DEEP LEARNING

MATH LIBRARIES

cuBLAS cuSPARSE

MULTI-GPU

Mocha.jl

Image Classification

DEEP LEARNING SDK

FRAMEWORKS

APPLICATIONS

NVIDIA CUDNN

Building blocks for accelerating deep neural networks on GPUs

High performance deep neural network training and inference

Accelerates Caffe, CNTK, Tensorflow, Theano, Torch

Performance continues to improve over time

“NVIDIA has improved the speed of cuDNN with each release while extending the interface to more operations and devices at the same time.”

— Evan Shelhamer, Lead Caffe Developer, UC Berkeley

developer.nvidia.com/cudnn

AlexNet training throughput based on 20 iterations, CPU: 1x E5-2680v3 12 Core 2.5GHz.

2014 2015 2016

K40(cuDNN v1)

M40(cuDNN v3)

Pascal(cuDNN v5)

NVIDIA DIGITSInteractive Deep Learning GPU Training System

Test Image

Monitor ProgressConfigure DNNProcess Data Visualize Layers

developer.nvidia.com/digitsgithub.com/NVIDIA/DIGITS

Instant productivity — plug-and-play, supports every AI framework

Performance optimized across the entire stack

Always up-to-date via the cloud

Mixed framework environments —containerized

Direct access to NVIDIA experts

DGX STACKFully integrated Deep Learning platform

NVIDIA DOCKER ON GITHUB

NVIDIA IMAGESPrebuilt and ready to use

DGX-1 CONTAINER LAUNCH FLOWCustomer data stays on premise

Web Browser

Node Management

User Authentication

Docker Image push/pull

Scheduler UI

HW/SW Metrics

LOCAL LAN

All Application Data

NFS Storage

DIGITS UI

Interactive Sessions

compute.nvidia.com 1. User schedules containers to run

3. User interacts with application

DIGITS FOR DGX-1A complete GPU-accelerated deep learning workflow

MANAGE TRAIN DEPLOY

DIGITS

DATA CENTER AUTOMOTIVE

TRAINTEST

MANAGE / AUGMENTEMBEDDED

GPU INFERENCE ENGINE

MODEL ZOO

BUILT FOR THE DATA CENTER

Data Center Ready24/7 Uptime

Boost data center throughput

Scalable Performance

Maximize reliability Simplify system operations

! !○

END-TO-END DESIGN FOR SYSTEM UPTIME 24/7 Uptime

Data Center Ready

Guaranteed QualitySystem Qual. Tests: Thermal, Stress, Airflow rate, Shock & Vibe

System Monitoring and Management for Tesla only

Dedicated Technical Staff for Failure Analysis

Extensive Qualification & Testing

Long Burn-in Testing

Zero Error Tolerance at Aggressive Clocks

Even with Differentiated Engineering 5% of GPUs are screened out

Differentiated Engineering

Low Operating Voltage for Long Term Reliability

Large Guard-band for Guaranteed Quality

Error Correction Code (ECC) for Data Integrity

DYNAMIC PAGE RETIREMENT MAXIMIZES UPTIME24/7 Uptime

Data Center Ready

GPU MEMORY

Uncorrectable Data Error causes application to

Weak memory page is retired

Tesla GPU with Dynamic Page Retirement

GPU without Dynamic Page Retirement (DPR)

Weak memory is still active

1. Users lose productivity as jobs continue to crash

2. IT Managers need to physically open up the server and remove the bad GPU

3. Customer satisfaction risk with RMA process

1. Removes bad memory with simple reboot

2. No physical work required for IT

3. Negligible impact: <0.01% of memory is retired

DATA CENTER QUALIFIED BY SERVER OEMS24/7 Uptime

Data Center Ready

Server with Tesla GPU

Server with Unqualified GPU

Designed for max airflow through GPU

Supports airflow front-to-back & back-to-front

Lower power consumption

GPU Temp Running Linpack: 54C

Works against server airflow

Higher power consumption

Lower reliability

GPU Temp Running Linpack: 71C

Airflow

Temp: 54C

Temp: 71C

SCALE-OUT PERFORMANCE IN THE DATA CENTER24/7 Uptime

Data Center Ready

8 16 32 64 96

Up to 2x Faster

Application Performance at Scale with GPUDirect RDMA

GPUDirect RDMAA

Direct transfers between GPUs

67% Lower GPU-to-GPU Latency

5x Higher GPU-to-GPU MPI Bandwidth

# of Nodes

Hoomd-Blue ApplicationLJ Liquid Benchmark, 256K Particles

without RDMAwith RDMA

NVLINK DELIVERS SCALABLE PERFORMANCE24/7 Uptime

Data Center Ready

More than 45x Faster with 8x P100 Interconnected with NVLink

Caffe/Alexnet VASP HOOMD-Blue COSMO MILC Amber HACC

2x K80 (M40 for Alexnet) 2x P100 4x P100 8x P100

2x Haswell

DATA CENTER GPU MANAGEMENT

24/7 Uptime

Device Management

• Device Identification

• Board Monitoring

• Clock Management

Per GPU Configuration & Monitoring

Data Center Ready

Enterprise-Grade Management Tool for Operating the Data Center

Active Health Monitoring ! Diagnostics &

System Validation

Runtime Health ChecksPrologue ChecksEpilogue Checks

Deep HW DiagnosticsSystem Validation Tests

Policy & Group Config Management

Pre-configured policiesJob level accountingStateful configuration

Power & Clock Mgmt.

Dynamic Power CappingSynchronous Clock Boost

Data Center GPU Manager (Tesla GPUs Only)

All GPUs Supported

DATA CENTER GPU MANAGER

24/7 Uptime

Data Center Ready

Integrated into Leading Industry Tools for HPC

Moab Cluster SuiteTORQUE

PBS Professional

IBM Platform HPCIBM Platform LSF

Bright Cluster Manager

StackIQ Boss for HPC with CUDA Pallet

Grid Engine

3rd PartySoftware

TAIPEI | SEP. 21-22, 2016

THANK YOU

NVIDIA DGX-1 超級電腦與人工智慧及深度學習

Technology

Transcript of NVIDIA DGX-1 超級電腦與人工智慧及深度學習

NVIDIA RAID-Installationsanleitungasrock.pc.cdn.bitgravity.com/Manual/RAID/K10N78FullHD-hSLI R3.0/German.pdf · 2 1. NVIDIA BIOS RAID-Installationsanleitung Die NVIDIA BIOS RAID-Installationsanleitung

Nvidia Grid Arhitecture

TRAINING WITH MIXED PRECISION - NVIDIA · 30 ALEXNET : COMPARISON OF RESULTS Nvcaffe-0.16, DGX-1, SGD with momentum, 100 epochs, batch=1024, no augmentation, 1 crop, 1 model Mode

Istorija Nvidia

New Technology of NVIDIA (DGX-2 활용 · 2018. 11. 26. · $ nvidia-smi topo -m g0 g1 g2 g3 g4 g5 g6 g7 g8 g9 g10 g11 g12 g13 g14 g15 cpu affinity gpu0 x nv6 nv6 nv6 nv6 nv6 nv6

NVIDIA CUDA 编程指南 · - 2 - gpu .....1 nvidia cuda

Nvidia Tegra

3D VISION - Nvidia - Artificial Intelligence Computing … · 2011-05-053D VISION - Nvidia - Artificial Intelligence Computing Leadership from NVIDIA

Профессиональные видеокарты NVIDIA Quadro P4000 и P2000 ... · • NVIDIA Quadro M5000; • NVIDIA Quadro M4000; • NVIDIA Quadro M2000. Тестирование

NVIDIA GeForce Experienceinternational.download.nvidia.com/GFE/User-Guides/GeForce... · NVIDIA GeForce Experience DU-05620-001_v02 | 2 Bölüm 01 : NVIDIA GeForce Experience Kullanıcı

Nvidia Mcp61 Series_ver1.0

NVIDIA GRID K1 E K2 APLICATIVOS E DESKTOPS VIRTUAIS COM ... · Title: Informativo técnico Cisco NVIDIA GRID K1 K2 Author: NVIDIA Corporation Subject: As placas NVIDIA GRID fornecem

エヌビディアのディープラーニング戦略 TESLA P100 & NVIDIA DGX-1

NVIDIA PhysX dla systemu Android (NVIDIA Tegra )

Cara Menginstall Nvidia

NVIDIA SLI tehnologija

NVIDIA Confidential NVIDIA GEFORCE GTX 550 Ti. NVIDIA Confidential G E F ORCE GTX GTX 480GTX 460.

nVidia Quadro4 XGL

Nike & Nvidia

Tenes Nvidia