Slide #1February 11, 1997 EECS 598 ---- Alpha Microprocessor Jerry Huang Alpha 21164 Microprocessor...

36
Slide #1 February 11, 1997 EECS 598 ---- Alpha EECS 598 ---- Alpha Microprocessor Microprocessor Jerry Huang Alpha 21164 Microprocessor The World’s Highest Performance Microprocessor Zhihui Huang (Jerry) University of Michigan

Transcript of Slide #1February 11, 1997 EECS 598 ---- Alpha Microprocessor Jerry Huang Alpha 21164 Microprocessor...

Slide #1 February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Alpha 21164 MicroprocessorAlpha 21164 Microprocessor

The World’s Highest Performance Microprocessor

Zhihui Huang (Jerry)

University of Michigan

Slide #2 February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Historical PerspectiveHistorical Perspective

CISC and Digital VAX (~1980) Serious exploration of RISC at Digital (1982) Fragmented efforts on RISC (1983~1984)

– SAFE, HR-32, CASCADE projects

First draft of the PRISM architecture (1985) Cancellation of PRISM (1988) First RISC workstation based on MIPS R2000(1989) Rename PRISM to Alpha (1990) First generation Alpha 21064 (1992) Second generation Alpha 21164 (1994)

Slide #3 February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Alpha Microprocessor RoadmapAlpha Microprocessor Roadmap

1992 1993 1994 19961995 1997

5

10

15

20

21064 - 150 MHz21064 - 200 MHz

21064A - 275 MHz

21164 - 300 MHz21164 - 333 MHz

21164 - 366 MHz

21164 - 400 MHz21164 - 433 MHz

21164 - 500 MHz

Here We are

Slide #4 February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

The 21164 ArchitectureThe 21164 Architecture

64-bit load and store RISC architecture Byte addressable

– 43-bit virtual address,40-bit physical address.

Integer Type: Byte,Word,Longword,Quadword. Floating-Point Data Types

– Longword integer format in floating-point unit– Quadword integer format in floating-point unit– IEEE and VAX floating-point format

CALL_PAL instruction

Slide #5 February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

21164 Characteristics21164 Characteristics

m CMOS technology– 4 layers of metalization– 9.66 million transistor counts– 14.4mm x 14.5mm die size (209mm2)

Package and Power– 499-pin PGA, 291 signal pins– 3.3v external,2.2v internal– 37W@433MHz

Clock Frequency 300MHz ~ 500MHz– SPECint95 11.3~15.4 respectively– SPECfp95 14.5~21.1 respectively

Slide #6 February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

On-chip Cache OrganizationOn-chip Cache Organization

An on-chip, 8KB primary instruction cache– direct mapped, 32-byte block(4 instructions)– virtual, 7-bit ASN(MAX_ASN = 127), 1-bit PALcode

An on-chip, 8KB primary data cache– dual-read-ported, single-write-ported– virtual indexed, physical tagged– write-through, read-allocate, direct mapped, 32-byte block

Large on-chip L2 cache– 96 KB, 3-way set associative, physical– write-back, write-allocate, byte-accessible– 32-byte(256-bit) or 64-byte(512-bit) block– mixed data and instruction cache– pipelined (16-byte per CPU cycle)

Slide #7 February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

TLB OrganizationTLB Organization

Instruction Translation Buffer– 48-entry, fully associative– not-last-used replacement algorithm– 8KB to 4MB page– 2 superpages only in privileged mode

Data Translation Buffer– 64-entry, fully associative– dual-read-ported– not-last-used replacement algorithm– superpage

Slide #8 February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

External InterfaceExternal Interface

Alpha21164

L1Cache

L2Cache

Support-- Oscillator

-- SerialROM

DRAMSIMM

sockets(X8)

BcacheDECchip 21172 Core Logic Chipset

cont

rol

DECchip 21172 - CIA

64-b

it

DECchip 21172 - BA 256-bit128-bit Data Bus

19-bitIndex

10-bit Tag

37-bit Address

control

PCI Bus

PCI-to-ISABridge

ISA Bus

Flash RomTime-of-Year

Keyboard/Mouse

2 IDE devicesDisketteParallel

2 serial ports

PCI Slots

Address/Control

Slide #9 February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Alpha 21164 Block DiagramAlpha 21164 Block Diagram

System Clock

Floating-Point Execution Unit (FBOX)

SROMInterface

Integer Register File (IRF) Floating Point Register File (FRF)

Clocks

Memory Address Translation Unit(MBOX)

DCACHE access control Miss Address File Write Buffer Dual Read Transaction Buffer

Instruction Cache (ICACHE)

Branch History Table TAG DATA

Data Cache(DCACHE)

TAG DATA

Scache Access Control External Bcache Control Bus Interface Unit

Cache Control and Bus Interface Unit (Cbox)

Adder

Multiplier

Divider

Integer Execution Unit (EBOX)

Multiplier

Adder

Shifter

Logic Box

Adder

Logic Box

Branch/Jump

Instruction Fetch/DecodeUnit (IBOX)

Prefetcher

Instruction Buffer

Instruction Slotter

Instruction Issue

Branch Prediction

Scache (L2 cache)

Tag Data

Set 3

Tag Data

Set 2

Tag Data

Set 1

Slide #10February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Microarchitecture Function unitsMicroarchitecture Function units

Instruction fetch and decode unit(IBOX)– Instruction prefetcher and instruction decoder– Branch prediction– Instruction translation buffer (ITB)– Interrupt support

Integer execution unit (Ebox) Floating-point execution unit(Fbox) Memory address translation unit (Mbox)

– Data Translation Buffer (DTB)– Miss Address File (MAF)– Write Buffer

Cache control and bus interface unit (Cbox)

Slide #11February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Instruction Issue Pipeline OrganizationInstruction Issue Pipeline Organization

InstructionCache(8KB)

PrefetchBuffer

BranchPredict

Next PC IssueConflict

Instruction Cache Instruction Buffer Instruction Slot Instruction Issue

S0 S1 S2 S3

Slide #12February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

S4

Execution Pipeline OrganizationExecution Pipeline Organization

IntegerRegs

Integer pipeline 0 : arith, logical,shift, load/store

Integer pipeline 1 : arith, logical,branch/jmp, load

Int Mult

Floating-pointRegs

FP pipeline 0 : add, subtract, compare,FP branch

FP pipeline 1 : multiply

FP divide

S3 S5 S6 S7 S8

Slide #13February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Memory Access PipelineMemory Access Pipeline

Dcache Access

Scache TagAccess

Scache DataAccess

Dcache Fill,Write Regs

Int Regs

Int Regs

FPRegs

S4 S5 S6 S7 S8 S10S9 S12S11

S3S2S1

S2S1S0 S4S3

Slide #14February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Instruction LatencyInstruction Latency

Instructions 21164 21064Most Integer operations 1 2

CMOV rdes,rsrc,rtest 2 3Integer 32-bit Multiply 8 19Integer 64-bit Multiply 16 23

Most FP operations 4 6FP single-precision divide 19 34FP double-precision divide 31 63

Loads Hit in L1 cache 2 ---Loads Hit in L2 cache 8 ---

Special Case 0

Slide #15February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Instruction Fetch/Issue UnitInstruction Fetch/Issue Unit

Branch and Jump Prediction– 2K entries Branch History Table (BHT)

• 2-bit saturate counter• built into Icache• not initialized on Icache fill

– Does not limit the number of branch predictions– 12-entry subroutine return stack

• store Icache index • PALmode and user mode prediction

– Mispredict trap • 4 ~ 5 cycles penalty on branch mispredict

Slide #16February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Instruction Prefetch Instruction Prefetch

L3Cache

CBOXL2

Cache(96 KB)

Dcache

MBOX

Integer Pipeline 0

Integer Pipeline 1

FP adder

FP Mult.

4-way

IssueUnit

Icache

PrefetchBuffer

Slide #17February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Instruction Decode/IssueInstruction Decode/Issue

Decode upto 4 instructions in parallel Check the structural hazard and data hazard Issue only the instructions without hazard Issue instructions IN ORDER Handle only NATURALLY ALLIGNED groups

of 4 instructions Does not advance until all 4 instructions are done No-op instruction is an important instruction

Slide #18February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

No-op InstructionsNo-op Instructions

Integer no-op– NOP (BIS R31,R31,R31)

Floating-point no-op– FNOP (CPYS F31,F31,F31)

Universal no-op– LDQ_U R31,...

Code example showing bad ordering

(a) LDL R2, 0(R1)(b) ADDL R2,R3,R4

(c) ADDL R2,R5,R6

Code example showing good ordering(a) LDL R2, 0(R1)(b) NOP

(c) ADDL R2,R3,R4(d) ADDL R2,R5,R6

Slide #19February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Code AnalysisCode Analysis

#define N 10main() { int i,j,temp; float a[N] = {1.0,3.0,5.0,2.0,9.0,0.0,4.0,8.0,7.0,6.0};

for (i=0;i<(N-1);i++) for (j=i+1;j<=(N-1);j++) if (a[i]<a[j]) {

temp = a[i];a[i] = a[j];a[j] = temp;

}}

Bubble Sort

Compiler Option: cc -newc -O4 -c -o bubble.o bubble.c

Slide #20February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Assembly Code in Groups(1)Assembly Code in Groups(1)

1st Group (0x0)– – ldah gp, 1(t12)– lda gp, -32528(gp)– lda sp, -48(sp)– cpys $f31,$f31,$f31

2nd Group (0x10)

– ldq a2, -32752(gp)– bis zero,ra,t11– ldq t12, -32744(gp)– bis zero, sp, a0

1st Pipeline States– t1 t2 t3 t4 t5 t6 t7– s3 s4 s5 s6– s3 s4 s5 s6– s3 s4 s5 s6– -- -- -- --

2nd Pipeline States– t1 t2 t3 t4 t5 t6 t7– s3 s4 s5 s6 – s3 s4 s5 s6– s3 s4 s5 s6– s3 s4 s5 s6

Slide #21February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Assembly Code in Groups(2)Assembly Code in Groups(2)

3rd Group (0x20)

– stq zero, 0(sp)– bis zero, 0x28, a1– bis zero, zero, t0– jsr ra, (t12), _Ots

4th Group (0x30)

– bis zero, 0x1, t1– subq t1, 0xa, t3– bge t3, 0x78– bis zero, t1, t2

3rd Pipeline States– t5 t6 t7 t8 t9 – s3 s4 s5 s6– s3 s4 s5 s6– s3 s4 s5 s6– s3 s4 s5 s6

4th Pipeline States– t30 t31 t32 t33 t34 t35– s3 s4 s5 s6– s3 s4 s5 s6 – s3 s4 s5 s6– s3 s4 s5 s6

Slide #22February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Assembly Code in Groups(3)Assembly Code in Groups(3)

5th Group (0x40)

– s4addq t1, sp, t5– lda t6, 36(sp)– s4addq t0, sp, t4– lds $f0, 0(t4)

6th Group (0x50)

– lds $f1, 0(t5)– cmptlt $f0, $f1, $f10– fbeq $f10, 0x64– sts $f1,0(t4)

5th Pipeline States– t33 t34 t35 t36 t37 t38– s3 s4 s5 s6– s3 s4 s5 s6– s3 s4 s5 s6– s3 s4 s5 s6 CV

6th Pipeline States– 35 36 37 38 39 40 41 42 43 44 – s3 s4 s5 s6 – s3 s4 s5 s6 s7 s8– s3 s4– s3 s4

Slide #23February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Assembly Code in Group (4)Assembly Code in Group (4)

7th Group (0x60)

– sts $f0, 0(t5)– lda t5, 4(t5)– cmpule t5,t6,t9– cpys $f31,$f31,$f31

8th Group (0x70)

– addl t2, 0x1, t2– bne t9, 0x4c– addl t1, 0x1, t1– subq t1, 0xa, t10

7th Pipeline States– 44 45 46 47 48 – s3 s4 s5 s6– s3 s4 s5 s6– s3 s4 s5 s6– -- -- -- --

8th Pipeline States– 46 47 48 49 50 51 52– s3 s4 s5 s6– s3 s4 s5 s6– s0 s1 s2 s3 s4 s5 s6– s0 s1 s2 s3 s4 s5

Slide #24February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Assembly Code in Groups (5)Assembly Code in Groups (5)

9th Group (0x80)

– addl t0, 0x1, t0– blt t10, 0x34– bis zero,t11,ra– bis zero,zero,v0

– lda sp, 48(sp)– ret zero, (ra), 1

9th Pipeline States– 52 52 54 55 56 57 58– s3 s4 s5 s6– s3 s4 s5 s6– s0 s1 s2 s3 s4– s0 s1 s2 s3 s4

– s0 s1 s2 s3– s0 s1 s2 s3

Slide #25February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

I-box Good and BadI-box Good and Bad

Good– instructions prefetch– low latency and high clock rate

Bad– high branch mispredict penalty– in order issue– naturally alligned issue– no stall after stage 4, replay every time when needs

stall

Slide #26February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

E-box Good and BadE-box Good and Bad

Good– low execution latency and high clock rate– supporting various floating-point format

Bad– LOAD/STORE multiplexed into Integer unit – one more stage for floating-point pipeline

What else ?

Slide #27February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Memory Unit OverviewMemory Unit Overview

Two-level Data Cache and a 64-entry DTB Memory Unit (Mbox)

– Load instruction and Miss Address File(MAF)• LDB,LDW,LDL,LDQ,LDL_L,LDQ_L ,LDS,LDT

– Store instruction and Write Buffer(WB)• STB,STW,STL,STQ,STL_C,STQ_C,STS,STT

– Memory Barrier (MB)– Write Memory Barrier (WMB)

Data Hazard and Replay Traps– Load After Store, Store After Store– MAF full and WB full

Slide #28February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Miss Address FileMiss Address File

Hold Load Misses in 6 Entries– physical address– destination register– instruction types

• integer/floating-point• 4-byte/8-byte/IEEE-S-Type/VAX-G-Type, etc.

Hold Instrction Fetch Address in 4 Entries– physical address

Slide #29February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Miss Address File DetailsMiss Address File Details

One on One Mapping? ?? LDL R2, 0(R1) and LDL R3,0(R1)

Same Size? ?? LDL R2,0(R1) and LDQ R3,8(R1)

Even with Even, Odd with Odd (LDL instruction only)? ?? LDL R2,0(R1) and LDL R3,12(R1)

Integer with Integer, FP with FP? ?? LDL R2,0(R1) and LDS FR2,8(R1)

Address1 Rn Rn Rn Rn Format

Address2 Rn Rn Rn Rn Format

Address3 Rn Rn Rn Rn Format

Address4 Rn Rn Rn Rn Format

Address5 Rn Rn Rn Rn Format

Address6 Rn Rn Rn Rn Format

0,4 8,12 16,20 24,28

32-byte per entry

Slide #30February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Data HazardData Hazard

Load after Store– (1 cycle later) Replay Trap (7 cycles penalty)– (2 cycles later) Issue Stalled – (Comliper Scheduled 3 cycles later) OK

Store after Load – Bits are set in each conflicting MAF entry to prevent its

fill from being placed in the Dcache when it arrives, and to prevent subsequent load from merging.

– Conflict bits are set with the store instruction in the write buffer to prevent the store instruction from being issued until all conflicting load instructions have been issued to Cbox

Slide #31February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

M-box Good and BadM-box Good and Bad

Good– non blocking– 2-level cache and large cache– merging for both load and store, reduce trafic– in order issue to the C-box and out of order completion

Bad– Replay every time when buffers are full, high penalty

What else ?

Slide #32February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Performance CharacterizationPerformance Characterization

espressoli

eqntottcompress

scgcc

spicedoduc

mdljdp2wave5

tomcatora

alvinnearmdljsp2

swm256su2corr

hydro2dnasa7

fppp

0 0.5 1 1.5 2 2.5 3

Percent of all cycles

Percentage of time in PALcodePercentage of time in PALcode

Slide #33February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Performance CharacterizationPerformance Characterization

0 20 40 60 80 100

espresso

eqntott

sc

spice

mdljdp2

tomcat

alvinn

mdljsp2

su2corr

nasa7

single

dual

triple

quad

Distribution of issue cycles for the Alpha 21164Distribution of issue cycles for the Alpha 21164

Slide #34February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Performance CharacterizationPerformance Characterization

espressoli

eqntottcompress

scgcc

spicedoduc

mdljdp2wave5

tomcatora

alvinnear

mdljsp2swm256

su2corrhydro2d

nasa7fppp

0 2 4 6 8 10 12 14

% branches mispredicted

Branch mispredictionsBranch mispredictions

Slide #35February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

Performance CharacterizationPerformance Characterization

espressoli

eqntottcompresssc

gccspice

doducmdljdp2

wave5tomcat

oraalvinn

earmdljsp2

swm256su2corr

hydro2dnasa7

fppp

0 20 40 60 80 100 120 140 160

Scache

Dcache

Icache

Cache misses per thousand instructions on the Alpha 21164Cache misses per thousand instructions on the Alpha 21164

Slide #36February 11, 1997

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

EECS 598 ---- Alpha EECS 598 ---- Alpha MicroprocessorMicroprocessor

Jerry Huang

ReferenceReference

Hardware Reference Manual– Digital Semiconductor 21164 Alpha Microprocessor

(order number : EC-QP99A-TE)

Alpha AXP Architecture Handbook– Digital Semiconductor

(order number : EC-QD2KA-TE)

Alpha Implementations and Architecture– D. Bhandarkar, Digital Press, QA 76.8 .A176 B471

Related materials for these slides– http://umaxp1.physics.lsa.umich.edu/~zhihuang