PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri Shomroni
-
Upload
amd-developer-central -
Category
Technology
-
view
1.091 -
download
2
description
Transcript of PT-4053, Advanced OpenCL - Debugging and Profiling Using AMD CodeXL, by Uri Shomroni
ADVANCED OPENCL™ DEBUGGING AND PROFILING USING CODEXL BUDI PURNOMO URI SHOMRONI
GNANABASKARAN MUTHUMANI
2 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
ABOUT OpenCL™
OpenCL™ is FUN!
! Parallel compute programming language
! Exposes the massively mulPthreaded GPU
! A lot of horsepower, opPmized for parallel compuPng
! Order-‐of-‐magnitude performance improvement!
3 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
OpenCL™ DEBUGGING AND PROFILING CHALLENGES
However,
! Debugging and profiling parallel processing applicaPons is hard ! On-‐Pme delivery of robust (bug-‐free) OpenCL™ applicaPons is challenging
! It is almost impossible to opPmize an OpenCL™-‐based applicaPon to fully uPlize the available parallel processing system resources
4 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
OpenCL™ DEBUGGING AND PROFILING CHALLENGES
OpenCL™ is a “Black Box”
! The applicaPon enqueues OpenCL™ commands
! The OpenCL™ runPme executes the commands
! Using a host profiler and debugger, the developer cannot ‒ Debug and profile the OpenCL™ kernels ‒ See the execuPon details ‒ View runPme loads
ApplicaPon
OpenCL™
5 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
AMD CodeXL
! APU and GPU Debugging FuncPonality ‒ OpenCL™ and OpenGL API-‐Level ‒ OpenCL™ Kernel Source Code
! APU, CPU and GPU Profiling ! OpenCL™ StaPc Kernel Analysis ! Provides the informaPon a developer needs to
help find bugs and opPmize the applicaPon’s performance
! Integrated into Microsoa® Visual Studio®
! Standalone applicaPon for Windows® and Linux®
GPU Debugging
7 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU DEBUGGING WITH AMD CodeXL | DEMO
! Sample provided with CodeXL tools suite ‒ API-‐Level debugging
‒ PinpoinPng OpenCL ™ Errors ‒ Entering Kernel debugging
‒ Locals and Watch views ‒ Kernel Source breakpoints
‒ Finding problemaPc work items ‒ OpenCL ™ Kernel MulPwatch view
AMDTTEAPOT
8 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU DEBUGGING WITH AMD CodeXL | VIEWS
! Displays OpenCL ™ and OpenGL API calls ‒ Supports funcPon calls from OpenCL™ up to version 1.2 and OpenGL up to version 4.3
‒ FuncPon parameters ‒ Object links in properPes ‒ API calls are divided per Compute / Render context. ‒ Calls history recording to an HTML log file
API CALLS HISTORY VIEW
9 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU DEBUGGING WITH AMD CodeXL | VIEWS
! Displays OpenCL™ and OpenGL allocated objects calls ‒ Object Hierarchy and counts ‒ Object properPes ‒ For objects with data / sources -‐ double click to open a main view ‒ Display detected memory leaks if "Break on Memory Leaks" is selected.
CODEXL EXPLORER
10 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU DEBUGGING WITH AMD CodeXL | VIEWS
! Displays host code, OpenCL™ kernel source, and OpenGL shader source ‒ Set source-‐level breakpoints in OpenCL™ kernels ‒ Display host thread and OpenCL™ kernel wavefront call stacks ‒ Visual Studio® integraPon
SOURCE AND CALL STACK VIEWS
11 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU DEBUGGING WITH AMD CodeXL | VIEWS
! Displays image, buffer and texture data ‒ Image view for OpenCL™ images and OpenGL textures and render buffers
‒ 3D image support with layer selecPon slider ‒ Non-‐RGB images mapped to grayscale range, with selecPon of minimum and maximum values clearly displaying out-‐of-‐range values
‒ Data view for all objects ‒ Channel order / type selecPon for buffer data ‒ ConnecPon to image view for objects that support it
OBJECT VIEWS
12 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU DEBUGGING WITH AMD CodeXL | VIEWS
! Display OpenCL™ kernel variables ‒ Structure and vector types support ‒ Global and Private memory array dereferencing
‒ Local and Constant memory support planned for future releases ‒ Visual Studio® integraPon
LOCALS AND WATCH VIEWS
13 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU DEBUGGING WITH AMD CodeXL | VIEWS
! Display a single OpenCL™ kernel variable value across the current work items ‒ Image and Data visualizaPon ‒ Range slider, like Object image view ‒ Current work item is highlighted and can be changed by double-‐clicking the data view.
MULTIWATCH VIEWS
14 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU DEBUGGING WITH AMD CodeXL | FEATURES
! Remote debugging ‒ Debug capabiliPes on a remote machine
‒ API-‐level debugging ‒ Kernel debugging
‒ Requires a CodeXL agent running on the target machine ‒ The agent is included as an opPon in the CodeXL installer ‒ Same agent for remote GPU debugging and remote GPU profiling
‒ Currently only supports Windows-‐to-‐Windows and Linux-‐to-‐Linux debugging
! OpenCL™ API support increased up to OpenCL™ 1.2 ‒ New API funcPons ‒ New deprecated funcPons and behaviors
! OpenGL API support increased up to OpenGL 4.3 ‒ New API funcPons and tokens ‒ New shader types
NEW IN CODEXL 1.3
15 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU DEBUGGING WITH AMD CodeXL | FEATURES
! Hardware-‐based kernel debugging ‒ Current implementaPon retrieves hardware values but performs kernel playback for breakpoint implementaPon
‒ Display data for the enPre grid ‒ OpPmized for small-‐ and medium-‐sized kernels ‒ Does not support debugging kernels that can't be replayed consistently (such as kernels using atomics)
‒ New implementaPon will use hardware breakpoints ‒ Display data according to the wavefronts executed in the actual hardware ‒ Faster for large kernels ‒ Stop and resume wavefront execuPon ‒ Can break a running kernel ‒ Can support debugging persistent kernels (aoach to kernel) ‒ Will allow data breakpoints
‒ Working development build in the demo area!
UPCOMING RELEASES
GPU Profiling
17 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU PROFILING WITH AMD CodeXL
! Analyze and profile OpenCL™ host and device code ‒ Collect applicaPon trace mode ‒ Collect GPU performance counter mode
! Views: ‒ API trace: View API calls with inputs and outputs ‒ Timeline visualizaPon: View host and device synch issue ‒ Summary pages: Find top booleneck ‒ Warnings/Errors: View performance suggesPons ‒ Kernel occupancy: Find kernel resource booleneck ‒ Performance counter: View kernel perf booleneck
! Does not require source or project modificaPons to the applicaPon
! Does not even require the applicaPon source code
KEY FEATURES
18 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU PROFILING WITH AMD CodeXL | Views
! Analyze and profile OpenCL™ applicaPons ‒ View API input arguments and output results ‒ Find API hotspots ‒ Determine top ten data transfer and kernel execuPon operaPons ‒ IdenPfy failed API calls, resource leaks and best pracPces
API TRACE
19 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU PROFILING WITH AMD CodeXL | Views
! Visualize host and device execuPon in a Pmeline chart ‒ View number of OpenCL™ contexts and command queues created and the relaPonships between these items ‒ View data transfer operaPons and kernel execuPons on the device ‒ Determine proper synchronizaPon and load balancing
TIMELINE VISUALIZATION
20 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU PROFILING WITH AMD CodeXL | Views
! Find top boolenecks ‒ I/O bound ‒ Compute bound
SUMMARY PAGES
21 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU PROFILING WITH AMD CodeXL | Views
! Provide performance improvement suggesPons
! Detect errors in an OpenCL™ applicaPon
WARNING AND ERROR MESSAGES
22 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU PROFILING WITH AMD CodeXL | Views
! Analyze the OpenCL™ kernel execuPon for AMD APUs and GPUs ‒ Collect GPU Performance Counters
‒ The number of ALU, global and local memory instrucPons executed ‒ GPU uPlizaPon and memory access characterisPcs
‒ Show the kernel resource usages ‒ View the AMD intermediate language (AMD IL) and hardware disassembly (ISA)
PERFORMANCE COUNTER
23 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU PROFILING WITH AMD CodeXL | Views
! EsPmate OpenCL™ kernel occupancy for AMD APUs and GPUs ‒ Visual indicaPon of the limiPng kernel resources for number of wavefronts in flight
‒ View the maximum number of wavefronts in flight limited by ‒ Work group size ‒ Number of allocated scalar or vector registers ‒ Amount of allocated LDS
‒ View the maximum resource limit for the GPU device
KERNEL OCCUPANCY
24 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
GPU PROFILING WITH AMD CodeXL | DEMO
OpPmizing AMD teapot applicaPon
! Finding and fixing non-‐opPmized kernel launch parameters ‒ API Trace and Warning and Error Messages View
! Visualizing host device synchronizaPon ‒ Timeline VisualizaPon
! NavigaPng to find the top booleneck ‒ Summary Pages View
! OpPmizing the kernel ‒ Kernel Occupancy and GPU Performance Counter View
StaPc Kernel Analysis
26 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
STATIC KERNEL ANALYSIS WITH AMD CodeXL
! Compile, analyze and disassemble an OpenCL™ kernel for AMD APUs, GPUs and CPUs. ‒ View AMD IL and hardware disassembly (ISA) ‒ View compilaPon warning and error messages
! Generate offline compilaPon of OpenCL™ kernel binary
! View compiler staPsPcs and esPmate performance
! Only require the OpenCL™ kernel source code as an input
! Does not require a GPU in the system
KEY FEATURES
27 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
STATIC KERNEL ANALYSIS WITH AMD CodeXL | FEATURES
! Integrated into AMD CodeXL standalone and Visual Studio® extension
! Brand new user experience ‒ View OpenCL™ kernel source, IL and ISA simultaneously ‒ View overview ‒ Generate analysis for SI and CI families of GPUs
‒ EsPmated cycle count with isa branch execuPon classificaPon
‒ Navigate compilaPon and analysis results in tree view
! Support compilaPon for the latest AMD APUs, GPUs and CPUs
NEW IN CODEXL 1.3
CPU Profiling
29 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
CPU PROFILING WITH AMD CodeXL
! IdenPfy and invesPgate CPU performance hot-‐spots
! Profiles C, C++, FORTRAN, Java, .NET, OpenCL™ applicaPons ! Profiles soaware components
‒ ApplicaPons, Libraries, Dynamically loaded modules ‒ OS Kernel modules
! Profile modes ‒ Per Process (target applicaPon and its children) ‒ System Wide Profiling
! Uses HW Performance Monitoring counters ‒ Low overhead
! No change to source code required ‒ Symbolic informaPon required to aoribute the performance data at funcPon/source level
30 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
CPU PROFILING WITH AMD CodeXL
! Profiling Types ‒ Time-‐based profiling ‒ Event-‐based profiling ‒ InstrucPon Based Sampling (IBS) ‒ Cache Line UPlizaPon ‒ Call Graph
! Pre-‐defined profile configuraPon of HW
performance events ‒ Assess Performance ‒ InvesPgate Data Access ‒ InvesPgate Branching
31 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
CPU PROFILING WITH AMD CodeXL
! Performance data are displayed in configurable views ‒ Samples aoributed at Process and Modules level ‒ Drill down to FuncPons, Source code and InstrucPons level
32 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
CPU PROFILING WITH AMD CodeXL
! Call Graph view displays the parents and children of hooest funcPon calls
33 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
CPU PROFILING WITH AMD CodeXL
! IdenPfy Hotspots ‒ Where the applicaPon spends its Pme ‒ Source level/algorithm related performance issues ‒ Use Time-‐base profiling
! IdenPfy the cause ‒ How well the applicaPon is using the CPU and Memory resources ‒ Performance boolenecks due to the micro-‐architectural constraints ‒ Use Event-‐based profiling or InstrucPon Based Sampling
! Precise instrucPon level profiling ‒ Use InstrucPon Based Sampling
! Cache-‐Line UPlizaPon -‐ Data access paoern
CPU Profiling Demo
35 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
AMD CodeXL SUMMARY
! Powerful APU and GPU Debugging ‒ OpenCL™ API Level ‒ OpenCL™ Kernel Source Code
! APU/GPU and CPU Profiling ‒ IdenPfy “hot spots” with inefficient code
! StaPc Kernel Analysis ‒ Compile, analyze and disassemble OpenCL™ kernel ‒ Generate offline compilaPon of OpenCL™ kernel binary
! Integrated into Microsoa® Visual Studio®
! Standalone applicaPon for Windows® and Linux®
! Free download at hop://developer.amd.com
37 | ADVANCED OPENCLTM DEBUGGING AND PROFILING USING CODEXL | NOVEMBER 13, 2013
DISCLAIMER & ATTRIBUTION
The informaPon presented in this document is for informaPonal purposes only and may contain technical inaccuracies, omissions and typographical errors.
The informaPon contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, soaware changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligaPon to update or otherwise correct or revise this informaPon. However, AMD reserves the right to revise this informaPon and to make changes from Pme to Pme to the content hereof without obligaPon of AMD to noPfy any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, the AMD Radeon and combinaPons thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdicPons. OpenCL is a trademark of Apple Inc. Microsoa, Windows and Visual Studio are trademarks of Microsoa Corp. Linux is a trademark of Linus Torvalds. Other names are for informaPonal purposes only and may be trademarks of their respecPve owners.