perc.nersc

1
http://perc.nersc.gov Performance Science and Engineering Measuring memory hierarchy performance Analytic Performance Bounds for a PETSc Kernel Mflop/s 0 100 200 300 400 500 600 700 800 900 SP O rigin T3E Pentium U ltra II TheoreticalP eak M em BW P eak O per.Issue P eak O bserved Bounding performance based on fundamental application characteristics Bloc k # Procedure Name Memory Ref. Mem. Ref. % L1 hit Rate L2 hit Rate Ratio Random Memory Bandwidt h Weighted Bandwidt h 1801 55 dgemv_n 4.82E+0 9 0.919 8 93.4 7 93.4 8 0.07 4166.0 3831.7 1801 59 dgemv_n 1.42E+0 8 0.027 1 90.3 3 90.3 9 0.00 1809.2 49.1 1801 60 dgemv_n 1.22E+0 8 0.023 2 94.8 1 99.8 9 0.00 5561.3 129.3 5885 MatSetVal ues 6.56E+0 7 0.012 5 77.3 2 90.0 0 0.20 1522.6 19.0 MAPS for TCSini for random and non-random loads Block # Bandwidth 180155 180159 180160 5885 2831.7 49.1 129.3 19.0 Convolving application & machine to predict performance Compiler framework to optimize high-level abstractions Infrastructure for accessing hardware performance monitors Infrastructure for dynamic instrumentation Tools for measuring & understanding application performance ENABLING TECHNOLOGIES C O N V O L U T I O N S Enhanced Simulations & Experiments Application Signatures Machine Signatures Bound Models Phase Models PAPI Sigma++ DynInst Scientific Simulations & Experiments ROSE SvPablo Tau Cache Simulator Prediction Tool Memory Ref Tool dumpMap .addr source files .lst files trace files Program Execution Instrumented binary Sigma Compile/Link Infrastructure for capturing & analyzing memory accesses Lawrence Livermore National Laboratory Dan Quinlan Bronis de Supinski University of Maryland Jeff Hollingsworth Oak Ridge National Laboratory Patrick Worley Jeffrey Vetter San Diego Supercomputing Center Allan Snavely University of North Carolina Dan Reed Argonne National Laboratory Paul Hovland Boyana Norris University of Tennessee Jack Dongarra Lawrence Berkeley National Laboratory David Bailey Erich Strohmaier imary participants: Supplementary participants: Technical University of Catalonia Jesús Labarta Los Alamos National Laboratory Adolfy Hoisie Harvey Wasserman Portland State University Karen Karavanic University of Oregon Allen Malony Rice University J. Mellor-Crummey University of Wisconsin Barton P. Miller Thomas J . Watson Research Center PO Box 218 Yorktown H eights, NY 10598 IBM Research Luiz DeRose

description

Block #. Procedure Name. Memory Ref. Mem. Ref. %. L1 hit Rate. L2 hit Rate. Ratio Random. Memory Bandwidth. Weighted Bandwidth. Block #. Bandwidth. 180155. dgemv_n. 4.82E+09. 0.9198. 93.47. 93.48. 0.07. 4166.0. 3831.7. source files. 180159. dgemv_n. 1.42E+08. - PowerPoint PPT Presentation

Transcript of perc.nersc

Page 1: perc.nersc

http://perc.nersc.gov

Performance Science and Engineering

Measuring memoryhierarchy performance

Analytic Performance Bounds for a PETSc Kernel

Mfl

op/s

0

100

200

300

400

500

600

700

800

900

SP Origin T3E Pentium Ultra II

Theoretical Peak Mem BW PeakOper. Issue Peak Observed

Bounding performancebased on fundamental

application characteristics

Block#

ProcedureName

Memory

Ref. Mem. Ref. %

L1 hit

Rate

L2 hit

RateRatio

Random

Memory Bandwidth

WeightedBandwidth

180155 dgemv_n 4.82E+09 0.9198 93.47 93.48 0.07 4166.0 3831.7

180159 dgemv_n 1.42E+08 0.0271 90.33 90.39 0.00 1809.2 49.1

180160 dgemv_n 1.22E+08 0.0232 94.81 99.89 0.00 5561.3 129.3

5885 MatSetValues 6.56E+07 0.0125 77.32 90.00 0.20 1522.6 19.0

MAPS for TCSini for random and non-random loads

Block #

Bandwidth180155180159180160

5885

2831.749.1

129.319.0

Convolvingapplication &

machine topredict

performance

Compilerframeworkto optimizehigh-level

abstractions

Infrastructurefor accessing

hardwareperformance

monitors

Infrastructurefor dynamic

instrumentation

Tools formeasuring &understanding

applicationperformance

ENABLING TECHNOLOGIESC O N V O L U T I O

N S

Enhanced Simulations

& Experiments

ApplicationSignatures

MachineSignatures

BoundModels

PhaseModels

PAPI Sigma++ DynInst

Scientific Simulations

& ExperimentsROSE SvPablo Tau

CacheSimulator

PredictionTool

MemoryRef Tool

dumpMap .addr

sourcefiles

.lstfiles

tracefiles

ProgramExecution

InstrumentedbinarySigma

Compile/Link

Infrastructurefor capturing& analyzing

memory accesses

Lawrence LivermoreNational LaboratoryDan QuinlanBronis de Supinski

Universityof Maryland

Jeff Hollingsworth

Oak RidgeNational LaboratoryPatrick Worley Jeffrey Vetter

San DiegoSupercomputing Center

Allan Snavely

University ofNorth Carolina

Dan Reed

ArgonneNational LaboratoryPaul HovlandBoyana Norris

Universityof Tennessee

Jack Dongarra

Lawrence BerkeleyNational LaboratoryDavid BaileyErich Strohmaier

Primary participants:

Supplementaryparticipants:

Technical Universityof Catalonia

Jesús Labarta

Los AlamosNational LaboratoryAdolfy HoisieHarvey Wasserman

Portland StateUniversity

Karen Karavanic

Universityof Oregon

Allen Malony

RiceUniversity

J. Mellor-Crummey

Universityof Wisconsin

Barton P. Miller

Thomas J. Watson Research CenterPO Box 218Yorktown Heights, NY 10598

IBMResearch

Luiz DeRose