NVIDIA Korea PSG - GIST · NVIDIA Korea PSG 이주석: jslee@ ... Maxwell equation solver Ring...
Transcript of NVIDIA Korea PSG - GIST · NVIDIA Korea PSG 이주석: jslee@ ... Maxwell equation solver Ring...
NVIDIA Confidential
ENTERPRISE GROUP Visualization, Accelerated Computing & Virtualization
TESLA Accelerating Momentum in HPC and Big Data Analytics
QUADRO Revolutionizing Design &
Visualization
GRID Enabling End-to-End
Enterprise Virtualization
NVIDIA Confidential
CUDA: World’s Most Pervasive Parallel Programming Model
700+ University Courses
In 62 Countries 14,000 Institutions with CUDA Developers
2,000,000 CUDA Downloads
487,000,000 CUDA GPUs Shipped
NVIDIA Confidential
GPUs Power World’s 10 Greenest Supercomputers
Green500
Rank MFLOPS/W Site
1 4,503.17 GSIC Center, Tokyo Tech
2 3,631.86 Cambridge University
3 3,517.84 University of Tsukuba
4 3,185.91 Swiss National Supercomputing (CSCS)
5 3,130.95 ROMEO HPC Center
6 3,068.71 GSIC Center, Tokyo Tech
7 2,702.16 University of Arizona
8 2,629.10 Max-Planck
9 2,629.10 (Financial Institution)
10 2,358.69 CSIRO
37 1959.90 Intel Endeavor (top Xeon Phi cluster)
49 1247.57 Météo France (top CPU cluster)
NVIDIA Confidential
Developer/Compute HPC/Big Data Graphics Life Science
Oil & Gas Finance Manufacturing Media & Entertainment
Graphics Virtualization Mobile App & Game
Development
PC Game Development In-car Infotainment
WHERE ART MEETS SCIENCE MEETS ENGINEERING
MEETS BUSINESS
4 Days
500+ Sessions
170+ Research Posters
5 Co-located Summits
48 Countries
3438 registers
www.nvidia.com/gtc
March 24-27, 2014 | San Jose, California
NVIDIA Confidential
113
182
242
0
50
100
150
200
250
300
2011 2012 2013
0%
20%
40%
60%
80%
100%
2010 2011 2012 2013
Accelerated Computing Growing Fast
Rapid Adoption of Accelerators
Hundreds of GPU Accelerated Apps
NVIDIA GPU is Accelerator of Choice
NVIDIA GPUs
85%
INTEL PHI
4% OTHERS
11%
Intersect360 Research HPC User Site Census: Systems, July 2013
Intersect360 HPC User Site Census: Systems, July 2013 IDC HPC End-User MSC Study, 2013
% of HPC Customers with Accelerators
44%
77%
NVIDIA Confidential
Performance gap continues to grow
0
500
1000
1500
2000
2500
2008 2010 2012 2014
Peak Double Precision FLOPS
NVIDIA GPU x86 CPU
Fermi
GT200
K20X
GK210-Duo
Nehalem
Sandy Bridge
Haswell
GFLOPS
0
100
200
300
400
500
600
2008 2010 2012 2014
Peak Memory Bandwidth
NVIDIA GPU x86 CPU
GB/s
Fermi
GT200
K20X
GK210-Duo
Nehalem Sandy Bridge
Haswell
NVIDIA Confidential
Hybrid GPU 솔루션
Application Code
+
가속기 CPU Parallelize using CUDA Programming Model
Only Critical Functions Rest of Sequential
CPU Code
NVIDIA Confidential
GPU Acceleration Across All Platforms
x86
POWER ARM64
NVIDIA GPU
NEW
NVIDIA Confidential
2015 2014
Kepler
x86 | ARM64 | Power8
2016
PCIe - 16 GB/s
NVLink- 80 GB/s
ARM64 | Power8+
Pascal
Connecting with CPUs via NVLink
NVIDIA Confidential
Arm + GPU
NVIDIA Confidential
Arm + GPU
SECO Hardware Development Kit
CUDA GPU Tegra ARM CPU
http://www.secoqseven.com/en/item/secocq7-mxm/
NVIDIA Confidential
GPUs Propel 64-Bit ARM into HPC
ARM64
Power Efficiency
System Configurability
Large, Open Ecosystem
GPU
Ultra-Fast Compute Perf
Hundreds of CUDA Apps
Large HPC Ecosystem
GPUs make ARM64 Competitive
in HPC from Day One
NVIDIA Confidential
GPU-Accelerated ARM64 Development Platforms Now Available
RM1905D Development Platform
1U Rackmount Server
2x ARM64 CPUs + 2x Tesla K20
GPUs
NVIDIA Confidential
Availability of ARM64 Platforms
Partner &
System System Description Availability Contact
CirraScale RM1905D
1U Rackmount
2x ARM64 CPUs + 2x Tesla K20
GPUs
Order now
Start Shipping: End of July
Al Lucarelli
E4 Computer Erka
3U Rackmount
2x ARM64 CPUs + 2x Tesla K20
GPUs
Order now
Start Shipping: End of July Piero Altoè
Eurotech Aurora
High density, Liquid cooled
8x ARM64 CPUs + 32 Tesla K20
GPUs in 3U space Est Availability Q4 2014
Giovanbattista Mattiussi [email protected]
Production Server Systems will available in Q4 2014
NVIDIA Confidential
IBM Partners with NVIDIA to Build Next-Generation Supercomputers
POWER 8
CPU Tesla
GPU
+
GPU-Accelerated POWER-Based Systems Available in 2014
NVIDIA Confidential
Introducing NVLINK and Stacked Memory
NVLINK GPU high speed interconnect
80-200 GB/s
Planned support for POWER
CPUs
Stacked Memory 4x Higher Bandwidth (~1 TB/s)
3x Larger Capacity
4x More Energy Efficient per bit
NVIDIA Confidential
Introducing NVLink
• Differential with embedded clock
• PCIe programming model (w/ DMA+)
• Unified Memory
• Cache coherency in Gen2.0
• 5 to 12X PCIe
NVIDIA Confidential
NVLink Enables Data Transfer At Speed of CPU Memory
TESLA
GPU CPU
DDR Memory Stacked Memory
NVLink
80 GB/s
DDR4
50-75 GB/s
HBM
1 Terabyte/s
NVIDIA Confidential
Unified Memory Dramatically Lower Developer Effort
Developer View Today Developer View With Unified Memory
Unified Memory System Memory
GPU Memory
NVIDIA Confidential
HPC over Cloud
NVIDIA Confidential
NVIDIA Confidential
GeForce GRID
CLIENT
Decode
Render
Kybd/Mse
SERVER
Render
Capture
Encode
GeForce GRID
60 ms
4 Frames
Network
30 ms
2 Frames
GeForce GRID
30-60 ms
2 Frames
IP Network
CPU NIC
NVIDIA Confidential
GPU virtualization technologies
Hypervisor
Control Path
VM
OS
NVIDIA Driver
NVIDIA GPU Hypervisor
VM
OS
NVIDIA Driver
Fast Path
OS API
Intercept
NVIDIA Driver
OS API
Intercept
Translation, Execution, Readback
VM
OS
NVIDIA Driver
VM VM
Direct-assigned GPU API intercept NVIDIA Virtual GPU
VM
OS
NVIDIA Driver
NMOS enabled
NVIDIA Confidential
Physics + CUDA programing + Visualization
NVIDIA Confidential
NVIDIA Confidential
Design + 공기역학 Simulation
공기의 힘으로 차체를 많이 누를수록 타이어와 바닥의 마찰력이 높아진다. 엔진과
브페이크의 힘이 바닥으로 전달되는 효율이 높아지면서 가속과 감속 능력이 좋아진다.
D=1/2pV²·A·Cd 여기서 D=공기저항, p=공기밀도, V= 차속, A=전면투영면적,
Cd=항력계수다. 앞 투영면적을 최대한 작게 하면 공기저항이 줄어든다
NVIDIA Confidential
다양한 산업 분야에서의 요구 사항 증가
Finance Government Edu/Research Oil and gas Life Sciences Manufacturing
Seismic Processing
Reservoir Sim
Astrophysics
Molecular Dynamics
Weather / Climate
Signal Processing Satellite Imaging Video Analytics
Bio-chemistry
Bio-informatics Material Science
Genomics
Risk Analytics Monte Carlo
Options Pricing Insurance
Structural Mechanics
Computational Fluid Dynamics
Electromagnetics
NVIDIA Confidential
Image Processing
NVIDIA Confidential
What is Machine Vision?
“I’m seeing more and more machine vision companies use GPUs.
Heuristic searches and training tasks that might have been impractical on a
single processor; you might now be able to do things that weren’t
possible, leading to a new whole new class of algorithms”
Perry West
President of Automated Vision Systems Inc http://www.machinevisiononline.org/vision-resources-details.cfm?content_id=411
1. Capture images in manufacturing line
2. Process images and make decision
on product quality
3. Take action on target product
1
3
2
NVIDIA Confidential
Machine Vision Solutions Available Now
ISV GPU Solution Description
MVTec HALCON 10 • Leading Machine Vision ISV with customers worldwide
Dalsa Sapera Nitrous • Another leading Machine Vision ISV with 8-10% market
share in fragmented market
Libraries GPU Solution Description
CUDA Vision
Workbench Computer Vision Workbench
• Application used primarily for demonstration,
benchmarking and development of vision primitives
implemented in CUDA
NVIDIA Library NVIDIA Performance
Primitives • Library of functions for performing CUDA accelerated
processing, with focus on imaging and video processing
Libraries to build custom solution
NVIDIA Confidential
0x
10x
20x
30x
40x
MVTec Halcon 10: 10x - 30x Faster
Speed-up: Tesla C2050 GPU vs Quad-core Intel Nehalem
NVIDIA Confidential
불량 분석 : 1시간 작업을 7분만에 완료
7 minutes
18 minutes
1 hr 20 mins
0
5
10
15
20
25
2 GPUs 1 GPU Dual Core2 Duo
x Fa
ste
r
CT Scan & Reconstruction of Solder Ball Failure using CUDA
courtesy North Star Imaging
xViewCT with 1536x1920 X-ray detector
1.2B voxels, 8GB raw data set
GPU를 이용한 BGA볼 불량 분석 사례 : X-ray 또는 카메라 이미지 프로세싱 가속화 알고리즘 사용
NVIDIA Confidential
Target Industries for Machine Vision
Product Quality
Operation Efficiency
Return on Investment
Higher Revenue
Textile
Steel Security Semiconductors
Food Electronic Manuf. Paper
Flat Panel
NVIDIA Confidential
Intelligent Video
Surveillance
Facial Recognition
Video and Imagery
Search and Analysis
Computer Vision
Video Enhancement
Signal Processing
10x-100x Faster
NVIDIA Confidential
CT 이미지 구성을 통한 3D 구현
NVIDIA Confidential
Enabling New Computation Solutions
Shared
Mem
Texture Engine
L2 Cache
Tessellation
Engine
Primitive
Engine
L1 TEX
Cache
192 CUDA Cores
Face Recognition
Head Tracking
Object Recognition
Recognition
Gesture
Recognition
3D Reconstruction
Augmented Reality
Perfect architecture for parallel algorithms
NVIDIA Confidential
Signal Processing
NVIDIA Confidential
1
2
QuadroPlex #1: Card #1&2 (SLI)
Output #1:0.1 QuadroPlex #2: Card #1
Output #1 : 0.1
QuadroPlex #2: Card #2
Output #1:0.2
지질 탐사, 유전 탐사 : 충격파를 이용하여 지질, 해저 구조를
CUDA로 분석 그리고 QuadroPlex를
이용한 8K 고해상도 영상 구현
충격파 Simulation + Visualization
NVIDIA Confidential
전자기학 simulation + 설계
9.9 Mcells/s
500.0 Mcells/s
0
100
200
300
400
500
600
Intel Xeon (2.6 GHz) 4 GPUs(Tesla 8-series)
Speed Mcells/s
Cell Phone Model Simulation Simulation size : 80 Mcells
FDTD Acceleration using GPUs Source: Acceleware
FDTD Solvers
Acceleware
EM Photonics
Ongoing work
Maxwell equation solver
Ring Oscillator (FDTD)
Particle beam dynamics simulator
NVIDIA Confidential
simulation
NVIDIA Confidential
가시화 위성 영상 분석
GPU 활용 사례 in CWO
3.5-km GEOS-5 Simulated Clouds
dx =2km
dx =1km
Reality
NWP 가속
NVIDIA Confidential
ASUCA and NWP Achievement: 145 TFLOPS
ASUCA and NWP Simulation on Tsubame 2.0, TiTech Supercomputer:
Dr. Takayuki Aoki, GSIC, Tokyo Institute of Technology, Tokyo Japan
Tsubame 2.0 Tokyo Institute of Technology
1.19 Petaflops
4,224 Tesla M2050 GPUs
3990 Tesla M2050s
145.0 Tflops SP
76.1 Tflops DP
NVIDIA Confidential
Available Today
Product in 2013
Product Evaluation
Research Evaluation
GPU Status Structural Mechanics Fluid Dynamics Electromagnetics
ANSYS Mechanical
Abaqus/Standard
MSC Nastran
Marc
AFEA
AMLS, FastFRS
NX Nastran
HyperWorks OptiStruct
PAM-CRASH implicit
LS-DYNA implicit
RecurDyn
Adventure Cluster
ANSYS CFD (FLUENT)
Moldflow
Culises (OpenFOAM)
Particleworks
SpeedIT (OpenFOAM)
AcuSolve
Abaqus/CFD
LS-DYNA CFD
CFD++
FloEFD
STAR-CCM+
XFlow
LS-DYNA
Abaqus/Explicit
RADIOSS
PAM-CRASH
EMPro
CST MWS
XFdtd
SEMCAD X
FEKO
Nexxim
JMAG
CFD-ACE+
GPU Progress – Commercial CAE Software
Xpatch
HFSS
SCSK
NVIDIA Confidential
164
210
341
395
0
100
200
300
400
500
CPU + GPU
CPU OnlyHigher
is Better
ANSYS Mechanical 14.5 GPU Acceleration
AN
SY
S M
echanic
al N
um
ber
of
Jobs
Per
Day
Xeon X5690 3.47 GHz 8 Cores + Tesla C2075
Xeon E5-2687W 3.10 GHz 8 Cores + Tesla K20
V14sp-5 Model
Turbine geometry
2,100,000 DOF
SOLID187 FEs
Static, nonlinear
One iteration (final
solution requires 25)
Distributed ANSYS 14.5
Direct sparse solver
Results from Supermicro
X9DR3-F, 64GB memory
Results for Distributed ANSYS 14.5 with 8-Core CPUs and single GPUs
Westmere Sandy Bridge
K20 = 1.9x Acceleration
C2075 = 2.1x Acceleration
NVIDIA Confidential
G1 G2 G3 G4
8-Cores 8-Cores 16-Core Server Node
Multi-GPU Acceleration of
16-Core ANSYS Fluent 15.0
(Preview) External Aero
Xeon E5-2667 + 4 x Tesla K20X GPUs
2.9X Solver Speedup
CPU Configuration CPU + GPU Configuration
ANSYS Fluent Solver Times for Sedan – 4 GPUs
3.6 M Mixed cells
Steady, k-e turbulence
Coupled PBNS, DP
AMG F-cycle on CPU
AMG V-cycle on GPU
NVIDIA Confidential
0
1.5
3
4.5
6
SOL101, 2.4M rows, 42K front SOL103, 2.6M rows, 18K front
serial 4c 4c+1g
MSC Nastran 2013 and GPU Performance SMP + GPU acceleration of SOL101 and SOL103
Higher is
Better
Server node: Sandy Bridge E5-2670 (2.6GHz), Tesla K20X GPU, 128 GB memory
1X 1X
2.7X
1.9X
6X
2.8X
Lanczos solver (SOL 103) Sparse matrix factorization
Iterate on a block of vectors (solve)
Orthogonalization of vectors
NVIDIA Confidential
1
1.5
2
2.5
3
3.5
0
5000
10000
15000
20000
8c 8c + 1g 8c + 2g 16c 16c + 2g
Elapsed Time in seconds Speed up relative to 8 core
Rolls Royce: Abaqus 3.5x Speedup with 5M DOF
Server with 2x E5-2670, 2.6GHz CPUs, 128GB memory, 2x Tesla K20X, Linux RHEL 6.2, Abaqus/Standard 6.12-2
• 4.71M DOF (equations); ~77 TFLOPs • Nonlinear Static (6 Steps) • Direct Sparse solver, 100GB memory Sandy Bridge + Tesla K20X Single Server
Speed u
p r
ela
tive t
o 8
core
(1x)
2.42x
2.11x
NVIDIA Confidential
Bio Informatics
NVIDIA Confidential
Computation: 3rd Pillar of Scientific Research
Experimental Description of natural phenomena
Experimental methods and quantification
Theoretical Formulation of Newton’s laws, Maxwell’s equations …
Computational Simulation of complex phenomena
Data Distributed communities unifying theory, experiment and simulation with massive data sets from multiple sources and disciplines
1,000 years ago Last 500 years Last 50 years Today
2
2
2.
3
4
a
cG
a
a
NVIDIA Confidential
Computer graphics require billions
to trillions of parallel computations
per second.
NVIDIA Confidential
Scientific simulations can require quadrillions of parallel computations per second.
NVIDIA Confidential
Gene Sequencing
Sequence Analysis
Molecular Modeling
Diagnostic Imaging
GPUs Accelerate Life Sciences Pipeline
NVIDIA Confidential
BGI (Beijing) Crunches Through Genomics Data Deluge with GPUs
Petabytes of data
Equal 15,000 human genomes /
year
Understand disease treatments
Study how individuals respond to
bacteria, virus, drugs
Personalized Medicine
NVIDIA Confidential
A key path to drug discovery is determining
the similarity of one molecule to another.
OpenEye software uses Tesla GPUs to
accelerate the process, enabling millions of
molecules to be compared in seconds,
rather than hours or days.
NVIDIA Confidential
USCD team uses Tesla GPUs for CT scans Reduces radiation dosage by up to 70 times
Up to 28,000 Americans each year develop cancer due to radiation from CT scans
NVIDIA Confidential
Drug Discovery Process in “Wet Labs”
Synthesize new
Chemical Compounds Testing for Efficacy,
Side Effects, Safety
Clinical Trials
FDA Approval
Process
Robot-assisted screening
High Throughput
Screening
Millions of
Compounds
1000s of
Drug Leads
Trial and Error
~5 years
NVIDIA Confidential
Computation-based Drug Discovery
Synthesize new
Chemical Compounds Testing for Efficacy,
Side Effects, Safety
Clinical Trials
FDA Approval
Process
Robot-assisted screening
High Throughput Screening
Millions of
Compounds
1000s of
Compounds
Check if compounds bind
to target proteins
Virtual Screening
Synthesize compounds
based on similarity
Computational Chemistry
Modify chemicals to
improve efficacy
Lead Optimization
NVIDIA Confidential
Example of using Computational Methods
878 FDA-Approved
Drugs 2,787 Pharmaceutical
Compounds
246 Targets (Proteins, etc)
From MDDR database
Similarity Ensemble
Approach (SEA) 6,928
Similar
Pairs
Remove known
associations 3,832
Remaining
Predictions Tested 30
Predictions
23 New Drug-Target Associations
Predicting new molecular targets for known drugs, Keiser et al, Nature, 2009
Confirmed one in animal
NVIDIA Confidential
Why throughput capacity matters
Astronomical Sciences
12% Chemical, Thermal Systems
4%
Advanced Scientific
Computing 5%
Earth Sciences
5%
All 15 Others 3%
Atmospheric Sciences
2%
Molecular Biosciences
29% Chemistry
13%
Materials Research
6%
Physics 21%
2008 TeraGrid Usage By Discipline
Astronomical Sciences
12%
Chemical, Thermal Systems
4%
Advanced Scientific
Computing 5%
Earth Sciences 5%
All 15 Others 3%
Atmospheric Sciences
2% Excess Capacity
25%
Molecular Biosciences
4%
Chemistry 13%
Materials Research
6%
Physics 21%
2008 TeraGrid Usage By Discipline
What’s the value of adding 25% capacity?
Equivalent to reducing Molecular Bioscience usage by 7x
NVIDIA Confidential
Big Data
NVIDIA Confidential
Big Data ?
6.7
2.4
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
Big Data Compute Enterprise Search
Source: Wikibon and Frost & Sullivan
$ Billion
Big Data Market Size, 2015
NVIDIA Confidential
Big Data Market Size, segment (‘13-’17)
4 5 7 8 8 2 3
5 6 7 13
19
26
30 33
0
5
10
15
20
25
30
35
40
45
50
2013 2014 2015 2016 2017
Compute Application Everything Else
Source: Wikibon, Wikibon.org
Note: For data related other segments, go to Appendix for reference
$Billion
NVIDIA Confidential
패턴 매칭 가속을 통해 Big Data를 분석
Analyzing Twitter
Shazam
Searching Audio Image-based Search Real-time
Video Delivery
NVIDIA Confidential
Now You Can Build Google’s
$1M Artificial Brain on the Cheap “ “
-Wired
Artificial Neural Network at a Fraction
of the Cost with GPUs
1,000 CPU Servers 2,000 CPUs • 16,000 cores
600 kWatts
$5,000,000
GOOGLE BRAIN
STANFORD AI LAB
3 GPU-Accelerated Servers 12 GPUs • 18,432 cores
4 kWatts
$33,000
Fast Growing GTC topics
NVIDIA Confidential
Hadoop Framework
Customer
Applications
Machine
Learning Search Data Mining
Tools and
Applications Mahout Hive Solr & Lucene Giraph Hama
HDFS MapReduce Basic Platform
SQL Graph
Analytics
Scientific
Computing
Sample
Customers NSA JPMC Chevron Facebook MTV Network eBay
Indexing
Storm
NVIDIA Confidential
Key Algorithms for Applications
Scientific Computing
Matrix Multiplication
Giraph
N-Body Simulation
Data Warehouse Graphic Analytics
Page Rank
Hama Hive
Gzip
Bzip2
Mahout
Naïve Bayes
Classifier
Fuzzy K-Means
Machine Learning
Recommenders
K-Means
Canopy
Decision Forests
Linear Regression
Frequent Itemset
Mining
Collocations
Solr & Lucene
Similarity Score
String Match
Word Count
Search
Apriori
Bellman-Ford
Depth-first Search
Sparse Matrix-Vector Multiplication
Snappy
Not Fit with GPU Not Sure Computing Intensive
Page View
Rank/Count
Inverted Index
Relational Algebra
Nearest neighbor
Shared connections
Personalization-
based Popularity
Priority-queue
based traversals
NVIDIA Confidential
다양한 산업 분야에서의 요구 사항 증가
Finance Government Edu/Research Oil and gas Life Sciences Manufacturing
Seismic Processing
Reservoir Sim
Astrophysics
Molecular Dynamics
Weather / Climate
Signal Processing Satellite Imaging Video Analytics
Bio-chemistry
Bio-informatics Material Science
Genomics
Risk Analytics Monte Carlo
Options Pricing Insurance
Structural Mechanics
Computational Fluid Dynamics
Electromagnetics
NVIDIA Confidential
KISTI- NVIDIA
Joint Laboratory
Education
&
Training center
openACC
& CUDA
Projects
GPU optimized
ISVs
Future
Architecture
Expand HPC users to Industry
In MV & ML
NVIDIA Confidential
CUDA everywhere
2007 2008 2009 2010 2011 2012 2013 2014
CUDA
tour
CUDA
workshop
CUDA
contest
서울대
연세대
고려대
경북대
KAIST
GIST
포항공대
연세대
KAIST 고려대
동의대
KAIST
GIST
포항공대
Round
Table
meeting
@Yangjae
서울대
고려대
경북대
KAIST
GIST
부경대
포항공대
경북대
인제대
UNIST
GIST
한양대
시립대
충남대
GIST
경북대
동명대
강촌 안면도 덕산 서울 곤지암
CUDA
trainings
KISTI
http://nvidiakoreapsc.com
NVIDIA Confidential
감사합니다