Introduction to Modern GPU...
Transcript of Introduction to Modern GPU...
Introduction to Modern GPU Hardware
Lan-Da Van (范倫達), Ph. D.
Department of Computer Science
National Chiao Tung University Hsinchu, Taiwan
Fall, 2018
1
The following content are extracted from the material in the references on
last page. If any wrong citation or reference missing, please contact
[email protected] . I will correct the error asap.
This course used only and please do NOT broadcast. Thank you.
Outline
2
GPU Pipeline
GPU Hardware History
GPU Hardware Consideration
Modern GPU Hardware Architecture
NVIDIA GeForce
AMD (ATI) Radeon
IMG PowerVR
ARM Mali
GPU Applications
Summary
GPU Fundamentals: Graphics Pipeline
• A simplified graphics pipeline
– Note that pipe widths vary
– Many caches, FIFOs, and so on not shown
GPUCPU
ApplicationTransform
& LightRasterize Shade Video
Memory
(Textures)
Xfo
rmed, L
it Vertic
es (2
D)
Graphics State
Render-to-texture
Assemble
Primitives
Vertic
es (3
D)
Scre
ensp
ace tria
ngle
s (2D
)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, Depth
)
GPU
Transform
& Light
CPU
Application Rasterize Shade Video
Memory
(Textures)
Xfo
rmed, L
it Vertic
es (2
D)
Graphics State
Render-to-texture
Assemble
Primitives
Vertic
es (3
D)
Scre
ensp
ace tria
ngle
s (2D
)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, Depth
)
GPU Fundamentals: ModernGraphics Pipeline
• Programmable vertex processor!
• Programmable pixel processor!
Fragment
Processor
Vertex
Processor
GPUCPU
ApplicationVertex
ProcessorRasterize
Fragment
ProcessorVideo
Memory
(Textures)
Xfo
rmed, L
it Vertic
es (2
D)
Graphics State
Render-to-texture
Vertic
es (3
D)
Scre
ensp
ace tria
ngle
s (2D
)
Fra
gm
ents (p
re-p
ixels)
Fin
al P
ixels (C
olo
r, Depth
)
GPU Fundamentals: ModernGraphics Pipeline
Assemble
Primitives
Geometry
Processor
Programmable primitive assembly!
More flexible memory access!
History of Graphics Hardware (1/3)
6
… - mid ’90s
SGI mainframes and workstations
PC: only 2D graphics hardware
mid ’90s
Consumer 3D graphics hardware (PC)
- 3dfx, NVIDIA, Matrox, ATI, …
Triangle rasterization (only)
Cheap: pushed by game industry
1999
PC-card with TnL (Transform and Lighting)
- NVIDIA GeForce: Graphics Processing Unit (GPU)
PC-card more powerful than specialized workstations
3DFX Voodoo graphics 4MB - 1997
History of Graphics Hardware (3/3)
8
Modern graphics hardware
Graphics pipeline partly programmable
Leaders: AMD(ATI) and NVIDIA
- “AMD Radeon HD 6990” and “NVIDIA GeForce GTX 590”
Game consoles similar to GPUs (Xbox)
Computational Power (1/2)
• GPUs are fast…
– 3.0 GHz Intel Core2 Duo (Woodcrest Xeon 5160):
• Computation: 48 GFLOPS peak
• Memory bandwidth: 21 GB/s peak
• Price: $874 (chip)
– NVIDIA GeForce 8800 GTX:
• Computation: 330 GFLOPS observed
• Memory bandwidth: 55.2 GB/s observed
• Price: $599 (board)
• GPUs are getting faster, faster
– CPUs: 1.4× annual growth
– GPUs: 1.7× (pixels) to 2.3× (vertices) annual growth
Motivation
• Why are GPUs getting faster so fast?
– Arithmetic intensity
• the specialized nature of GPUs makes it easier to use additional transistors for computation
– Economics
• multi-billion dollar video game market is a pressure cooker that drives innovation to exploit this property
Flexible and Precise
• Modern GPUs are deeply programmable
– Programmable pixel, vertex, and geometry engines
– Solid high-level language support
• Modern GPUs support “real” precision
– 32-bit/64-bit floating point throughout the pipeline
• High enough for many applications
– DX10-class GPUs add 32-bit integers
Graphics Hardware Consideration (1/2)
• GPU = Graphics Processing Unit– Vector processor
– Operates on 4 tuples• Position ( x, y, z, w )
• Color ( red, green, blue, alpha )
• Texture Coordinates ( s, t, r, q )
– 4 tuple ops, 1 clock cycle• SIMD [ Single Instruction Multiple Data ]
– ADD, MUL, SUB, DIV, MADD, …
• Pipelining
– Number of stages
• Parallelism
– Number of parallel processes
• Parallelism + pipelining
– Number of parallel pipelines
1 2 3
1 2 3
1 2 3
1 2 3
1
2
3
Graphics Hardware Consideration (2/2)
Outline
17
GPU Pipeline
History of GPU Hardware
GPU Hardware Consideration
Modern GPU Hardware Architecture
NVIDIA GeForce
AMD (ATI) Radeon
IMG PowerVR
ARM Mali
Summary
Growth of NVIDIA GPU
• Performance matrices
– Since 2000, the amount of horsepower applied to processing 3D vertices and fragments has been growing at a remarkable rate.
Nvidia Graphics Card Architecture
• GeForce-8 Series– 12,288 concurrent threads, hardware managed– 128 Thread Processor cores at 1.35 GHz == 518 GFLOPS peak
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
TEX L1
SP
SharedMemory
IU
SP
SharedMemory
IU
TF
L2
Memory
Work DistributionHost CPU
L2
Memory
L2
Memory
L2
Memory
L2
Memory
L2
Memory
FERMI: Streaming Multiprocessor (SM)
• Each SM contains
• 32 Cores
• 16 Load/Store units
• 32,768 registers
• Newer FP representation
• IEEE 754-2008
• Two units
• Floating point
• Integer
Maxwell: Core Architecturehttp://www.weistang.com/article-941-1.html
http://www.coolaler.com/showthread.php/313295-
%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%A
B%98%E6%95%88GPU%EF%BC%9ANVIDIA-
Maxwell%E6%9E%B6%E6%A7%8B
Kepler vs Maxwell Comparison
http://www.coolaler.com/showthread.php/313295-
%E5%8F%B2%E4%B8%8A%E6%9C%80%E9%AB%98%E6%95%88GPU%EF%BC%9ANVIDIA-
Maxwell%E6%9E%B6%E6%A7%8B
2012 2014
Mobile Roadmap
09/02/11
http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler-
into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table-
drawing-tablet?page=2
• Features of ATI Radeon X1900 XTX
– Core speed 650 MHz
– 48 pixel shader processors
– 8 vertex shader processors
– 51 GB/s memory bandwidth
– 512 MB memory
ATI Radeon X1900 XTX
http://product.pcpop.com/000024721/Index
.html
GPU
650MHzGraphics memory
½ GB
CPU
3GHzMain memory
1GB
Cach
e
½M
B
AGP bus
2GB/s
Output
Graphics CardHigh bandwidth
51GB/s
High bandwidth
77GB/s
Par
alle
l P
roce
sses
3GB/s
AGP memory
½ GB
Processor Chip
• High Memory Bandwidth
ATI Radeon X1900 XTX
IMG PowerVR Series5XT (SGXMP)
42
• Shader-driven Tile-Based Deferred Rendering (TBDR) architecture
• Fully programmable GPU using unique USSE architecture
• All SGX cores support OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX 9/10.1
IMG PowerVR Series6 (Rogue)
44
• Support OpenGL ES 3.0, OpenGL ES 2.0, OpenGL 3.x/4.x, OpenCL 1.x and DirectX10 with certain family members extending their capabilities to full WHQL-compliant DirectX11.1 functionality
ARM Mali-T604
• GPGPU (support OpenCL 1.1)
• Tri-pipe architecture
• The first GPU based on the Midgard architecture
• True IEEE double-precision floating-point math in hardware for Full Profile
• The Job Manager within Mali-T600 Series GPUs offloads task management from the CPU to the GPU
• 5x performance improvement over previous Mali graphics processors.
53
Applications (1/7)
• Includes lots of applications
– Ray-tracer
– Image segmentation
– FFT/Linear Algebra
http://graphics.stanford.edu/data/3Ds
canrep/stanford-bunny-cebal-ssh.jpg
http://f.fwallpapers.com/images/3d
-bunny.jpg
09/02/11
Applications (2/7)
http://www.techbang.com/posts/19899-nvidia-shield-rebirths-carrying-kepler-
into-the-tablet-market-discarded-palm-machine-changes-to-core-login-table-
drawing-tablet?page=2
09/02/11
http://wechatinchina.com/thread-461154-1-1.html
Applications (6/7)
AR and VR Applications @@
Applications (7/7)
09/02/11
http://www.naipo.com/Portals/1/web_tw/Knowledge_Center/Industry_E
conomy/publish-482.htm
Summary
70
Understand the GPU pipeline in depth
Understand the motivation of of GPU hardware
Understand modern GPU hardware architecture and
specifications
Understand GPU/GPGPU applications and key problems
Reference
71
GPU Architecture & CG, Mark Colbert, 2006
Introduction to Graphics Hardware and GPUs, Yannick Francken,
Tom Mertens
GPU Tutorial, Yiyunjin, 2007
Evolution of GPU and Graphics Pipelining, Weijun Xiao
Commercial product website (NVIDIA, ATI, IMG, ARM).
Referencing SIGGRAPH 2005 Course Notes from David Luebke
Adapted from: David Luebke (University of Virginia) and NVIDIA
Jan Verschelde, MCS 572 Lecture 27, Introduction to
Supercomputing, 17 March 2014
Acknowledgement:
Thanks for TA’s help for preparing the material.