NVIDIA DGX-1 超級電腦與人工智慧及深度學習
-
Upload
nvidia-taiwan -
Category
Technology
-
view
228 -
download
1
Transcript of NVIDIA DGX-1 超級電腦與人工智慧及深度學習
TAIPEI | SEP. 21-22, 2016
Eric Kang 康勝閔, Sep. 21 2016
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
2
GPU Computing
NVIDIAComputing for the Most Demanding Users
Computing Human Imagination
Computing Human Intelligence
3
DEEP LEARNING EVERYWHERE
INTERNET & CLOUD
Image ClassificationSpeech Recognition
Language TranslationLanguage ProcessingSentiment AnalysisRecommendation
MEDIA & ENTERTAINMENT
Video CaptioningVideo Search
Real Time Translation
AUTONOMOUS MACHINES
Pedestrian DetectionLane Tracking
Recognize Traffic Sign
SECURITY & DEFENSE
Face DetectionVideo SurveillanceSatellite Imagery
MEDICINE & BIOLOGY
Cancer Cell DetectionDiabetic GradingDrug Discovery
4
DEEP LEARNING APPROACH
Deploy:
Dog
Cat
Honey badger
Errors
DogCat
Raccoon
Dog
Train:
DNN
DNN
5
72%74%
84%
88%
93%
96%
2010 2011 2012 2013 2014 2015
“SUPERHUMAN” RESULTSSPARK HYPERSCALE
ADOPTION
Deep Learning
ImageNet — Accuracy %
Cloud Services with AI Powered by NVIDIA
Alibaba/Aliyun Amazon Baidu eBay Facebook
Flickr Google iFLYTEK iQIYI JD.com
Orange Periscope Pinterest Qihoo 360 Shazam
Skype Sogou Twitter Yahoo Supermarket Yandex YelpHand-coded CV
Human
74%76%
6Source: IDC Worldwide Big Data and Analytics 2016 Predictions, November 2015. IDC FutureScape: Worldwide Digital Strategy Consulting 2016 Predictions, Nov 2015;
“By 2020, 80% of Big Data and Analytics deployments will need distributed micro analytics and 40% of all business analytics software will incorporate prescriptive analytics built on cognitive computing functionality. Both of these trends require a dramatic increase in processing power that could be enabled by GPUs.”
— IDC
“By 2018, over 50% of developer teams will embed cognitive services in their apps (vs 1% today) providing U.S. enterprises with over $60 billion annual savings by 2020.”
— IDC
AI — THE NEXT TRILLION $ IT OPPORTUNITY
7
Deep Learning is a massive opportunity
Data Scientist productivity is vital
NVIDIA is the choice of the deep learning world
DGX-1 is fast, instantly productive
NVIDIA DGX-1The Essential Tool of
Deep Learning Scientists
170 TFLOPS | 8x Tesla P100 16GB | NVLink Hybrid Cube Mesh2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U
8
TESLA P100 WITH NVLINKNew GPU Architecture to Enable the World’s Fastest Compute Node
Pascal Architecture NVLink CoWoS HBM2 Page Migration EnginePCIe
SwitchPCIe
Switch
CPU CPU
Highest Compute Performance GPU Interconnect for Maximum Scalability
Unifying Compute & Memory in Single Package
Simple Parallel Programming with Virtually Unlimited Memory
Unified Memory
CPU
Tesla P100
9
Engineered for deep learning | 170TF FP16 | 8x Tesla P100
NVLink hybrid cube mesh | Accelerates major AI frameworks
NVIDIA DGX-1WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER
10
NVIDIA DEEP LEARNING SDKHigh Performance GPU-Acceleration for Deep Learning
COMPUTER VISION SPEECH AND AUDIO BEHAVIORObject Detection Voice Recognition Translation
Recommendation Engines Sentiment Analysis
DEEP LEARNING
cuDNN
MATH LIBRARIES
cuBLAS cuSPARSE
MULTI-GPU
NCCL
cuFFT
Mocha.jl
Image Classification
DEEP LEARNING SDK
FRAMEWORKS
APPLICATIONS
11
NVIDIA CUDNN
Building blocks for accelerating deep neural networks on GPUs
High performance deep neural network training and inference
Accelerates Caffe, CNTK, Tensorflow, Theano, Torch
Performance continues to improve over time
“NVIDIA has improved the speed of cuDNN with each release while extending the interface to more operations and devices at the same time.”
— Evan Shelhamer, Lead Caffe Developer, UC Berkeley
developer.nvidia.com/cudnn
AlexNet training throughput based on 20 iterations, CPU: 1x E5-2680v3 12 Core 2.5GHz.
0x
2x
4x
6x
8x
10x
12x
2014 2015 2016
K40(cuDNN v1)
M40(cuDNN v3)
Pascal(cuDNN v5)
12
NVIDIA DIGITSInteractive Deep Learning GPU Training System
Test Image
Monitor ProgressConfigure DNNProcess Data Visualize Layers
developer.nvidia.com/digitsgithub.com/NVIDIA/DIGITS
13
Instant productivity — plug-and-play, supports every AI framework
Performance optimized across the entire stack
Always up-to-date via the cloud
Mixed framework environments —containerized
Direct access to NVIDIA experts
DGX STACKFully integrated Deep Learning platform
14
NVIDIA DOCKER ON GITHUB
15
NVIDIA IMAGESPrebuilt and ready to use
16
DGX-1 CONTAINER LAUNCH FLOWCustomer data stays on premise
Web Browser
Node Management
User Authentication
Docker Image push/pull
Scheduler UI
HW/SW Metrics
LOCAL LAN
All Application Data
NFS Storage
DIGITS UI
Interactive Sessions
compute.nvidia.com 1. User schedules containers to run
3. User interacts with application
17
DIGITS FOR DGX-1A complete GPU-accelerated deep learning workflow
MANAGE TRAIN DEPLOY
DIGITS
DATA CENTER AUTOMOTIVE
TRAINTEST
MANAGE / AUGMENTEMBEDDED
GPU INFERENCE ENGINE
MODEL ZOO
18
BUILT FOR THE DATA CENTER
Data Center Ready24/7 Uptime
Boost data center throughput
Scalable Performance
Maximize reliability Simplify system operations
! !○
19
END-TO-END DESIGN FOR SYSTEM UPTIME 24/7 Uptime
Scalable Performance
Data Center Ready
Guaranteed QualitySystem Qual. Tests: Thermal, Stress, Airflow rate, Shock & Vibe
System Monitoring and Management for Tesla only
Dedicated Technical Staff for Failure Analysis
Extensive Qualification & Testing
Long Burn-in Testing
Zero Error Tolerance at Aggressive Clocks
Even with Differentiated Engineering 5% of GPUs are screened out
Differentiated Engineering
Low Operating Voltage for Long Term Reliability
Large Guard-band for Guaranteed Quality
Error Correction Code (ECC) for Data Integrity
20
DYNAMIC PAGE RETIREMENT MAXIMIZES UPTIME24/7 Uptime
Scalable Performance
Data Center Ready
GPU MEMORY
Uncorrectable Data Error causes application to
crash
Weak memory page is retired
Tesla GPU with Dynamic Page Retirement
GPU without Dynamic Page Retirement (DPR)
Weak memory is still active
1. Users lose productivity as jobs continue to crash
2. IT Managers need to physically open up the server and remove the bad GPU
3. Customer satisfaction risk with RMA process
1. Removes bad memory with simple reboot
2. No physical work required for IT
3. Negligible impact: <0.01% of memory is retired
!
21
DATA CENTER QUALIFIED BY SERVER OEMS24/7 Uptime
Scalable Performance
Data Center Ready
Server with Tesla GPU
Server with Unqualified GPU
Designed for max airflow through GPU
Supports airflow front-to-back & back-to-front
Lower power consumption
GPU Temp Running Linpack: 54C
Works against server airflow
Higher power consumption
Lower reliability
GPU Temp Running Linpack: 71C
Airflow
Temp: 54C
Temp: 71C
22
SCALE-OUT PERFORMANCE IN THE DATA CENTER24/7 Uptime
Scalable Performance
Data Center Ready
0
500
1000
1500
2000
8 16 32 64 96
Up to 2x Faster
Application Performance at Scale with GPUDirect RDMA
GPUDirect RDMAA
Direct transfers between GPUs
67% Lower GPU-to-GPU Latency
5x Higher GPU-to-GPU MPI Bandwidth
Tim
e-st
eps
per
Sec
# of Nodes
Hoomd-Blue ApplicationLJ Liquid Benchmark, 256K Particles
without RDMAwith RDMA
23
NVLINK DELIVERS SCALABLE PERFORMANCE24/7 Uptime
Scalable Performance
Data Center Ready
More than 45x Faster with 8x P100 Interconnected with NVLink
0x
5x
10x
15x
20x
25x
30x
35x
40x
45x
50x
Caffe/Alexnet VASP HOOMD-Blue COSMO MILC Amber HACC
2x K80 (M40 for Alexnet) 2x P100 4x P100 8x P100
Spee
d-up
vs
Dual
Soc
ket
Has
wel
l
2x Haswell
CPU
24
DATA CENTER GPU MANAGEMENT
24/7 Uptime
Scalable Performance
Device Management
• Device Identification
• Board Monitoring
• Clock Management
Per GPU Configuration & Monitoring
Data Center Ready
Enterprise-Grade Management Tool for Operating the Data Center
Active Health Monitoring ! Diagnostics &
System Validation
Runtime Health ChecksPrologue ChecksEpilogue Checks
Deep HW DiagnosticsSystem Validation Tests
Policy & Group Config Management
Pre-configured policiesJob level accountingStateful configuration
Power & Clock Mgmt.
Dynamic Power CappingSynchronous Clock Boost
!
Data Center GPU Manager (Tesla GPUs Only)
All GPUs Supported
25
DATA CENTER GPU MANAGER
24/7 Uptime
Scalable Performance
Data Center Ready
Integrated into Leading Industry Tools for HPC
Moab Cluster SuiteTORQUE
PBS Professional
IBM Platform HPCIBM Platform LSF
Bright Cluster Manager
StackIQ Boss for HPC with CUDA Pallet
Grid Engine
3rd PartySoftware
TAIPEI | SEP. 21-22, 2016
THANK YOU