Speculative ExecutionCS510 Computer ArchitecturesLecture 11 - 1 Lecture 11 Trace Scheduling,...

Speculative Execution CS510 Computer Architectures Lecture 11 - 1

Lecture 11Lecture 11Trace Scheduling, Trace Scheduling,

Conditional Execution, Conditional Execution, Speculation, Speculation, Limits of ILPLimits of ILP

Lecture 11Lecture 11Trace Scheduling, Trace Scheduling,

Conditional Execution, Conditional Execution, Speculation, Speculation, Limits of ILPLimits of ILP

Trace SchedulingTrace SchedulingTrace SchedulingTrace Scheduling

• Parallelism across IF branches vs. LOOP branches– Trace scheduling works when the behavior of the branches is

fairly predictable at compile time

• Two steps:– Trace Selection

• Find likely sequence of basic blocks (trace) of (statically predicted) long sequence of straight-line code

– Trace Compaction• Squeeze trace into few VLIW instructions

• Need bookkeeping code in case prediction is wrong

Trace SchedulingTrace SchedulingTrace SchedulingTrace Scheduling

* See the kinds of exceptions in page 179

Trace Compaction by speculation - Move the code associated with B and C to make VLIW word(s) before the branch - This may cause exceptions when executed

A[i] = A[i]+B[i]

A[i]=0T F

Select this Trace If True branch is taken more frequently

Speculation should not introduce any new exception*

HW Support for More ILPHW Support for More ILPConditional InstructionsConditional Instructions

• Avoid branch prediction by turning branches into conditionally executed instructions:

if (x) then A = B op C else NOP– If false, then neither stores result nor causes exception*

– Expanded ISA of Alpha, MIPS, SPARC have conditional move; PA-RISC can annul any following instr.

• Drawbacks to conditional instructions– Still takes a clock even if “annulled”

– Stall if condition is evaluated late

– Complex conditions reduce effectiveness; condition becomes known late in pipeline

* See the kinds of exceptions in page 179

HW Support for More ILPHW Support for More ILPConditional InstructionsConditional Instructions

LWC must have no effect if the condition is not satisfied. LWC cannot write the result nor cause any exceptions if the condition is not satisfied.

Two-issue superscalar, combination of one M reference and one ALU(or Br) operations

First instruction slot Second instruction slot

LW R1,40(R2) ADD R3,R4,R5

ADD R6,R3,R7

BEQZ R10,L

LW R8,20(R10)

LW R9,0(R8)

Waste of the Green slot.Data dependence in Reds.

Example

BNZ R1,L CMOVZ R2,R3,R1

MOV R2,R3

First instruction slot Second instruction slot

LW R1,40(R2) ADD R3,R4,R5

LWC R8,20(R10),R10 ADD R6,R3,R7

BEQZ R10,L

LW R9,0(R8)

Execute LW only when [R10] = 0, i.e.,LWC is same as LW unless 3rd operand is 0.

HW Support for More ILPHW Support for More ILPSpeculationSpeculation

Speculation

Allow an instruction to issue that is dependent on a branch (predicted to be taken) without any consequences(including exceptions).

If branch is not actually taken (“HW undo”)

– allows the execution of an instruction before the processor knows that the instruction should execute(i.e., it avoids control dependence stall)

• Often try to combine with dynamic scheduling

• Tomasulo

Separate speculative bypassing of results from real bypassing of results

– When an instruction is no longer speculative, write its results (instruction commit)

– execute out-of-order but commit in order

Compiler Speculation with HW Support:Compiler Speculation with HW Support:

(1) HW-SW Cooperation for Speculation(1) HW-SW Cooperation for SpeculationCompiler Speculation with HW Support:Compiler Speculation with HW Support:

(1) HW-SW Cooperation for Speculation(1) HW-SW Cooperation for Speculation

• HW undo for miss prediction– simply handle all resumable exceptions when exception occurs

– simply return an undefined value for any exception that would cause termination

the compiled code using compiler-basedspeculation

LW R1, 0(R3) ; load ALW R14, 0(R2) ; speculative load BBEQZ R1, L3 ; other branch of the ifADD R14, R1, 4 ; the else clause

L3: SW 0(R3), R14 ; nonspeculative store

if (A==0) A =B; else A = A + 4;

compiled code

LW R1, 0(R3) ; load ABNEZ R1,L1 ; test ALW R1, 0(R2) ; if clauseJ L2 ; skip else

L1: ADD R1,R1,4 ; else clauseL2: SW 0(R3), R1 ; store A

* Assume the then clause is almost always executed. Register renaming;

Need for an extra register

(2) Speculation with Poison Bits(2) Speculation with Poison BitsCompiler Speculation with HW Support:Compiler Speculation with HW Support:

(2) Speculation with Poison Bits(2) Speculation with Poison Bits

• Speculation with Poison Bits– allows compiler speculation with less change to the exception

behavior

– a poison bit is added to every register

– another bit is added to every instruction to indicate whether the instruction is speculative

LW R1, 0(R3) ; load ALW* R14, 0(R2) ; speculative load BBEQZ R1, L3 ; other branch of the ifADD R14, R1, 4 ; the else clause

If the speculative LW* generates a terminating exception,the poison bit of R14 will be set. When the nonspeculativeSW instruction occurs, it will raise an exception if the poisonbit for R14 is on.

Compiler Speculation Compiler Speculation with HW Supportwith HW Support

• The main disadvantages of the two previous schemes– the need to introduce copies to deal with register renaming

– the possibility of exhausting the registers

• Speculative Instructions with Renaming (Boosting)– flagging the instructions which are moved past branches as

speculative

– providing renaming and buffering in the HW

(3) Speculative Instructions (3) Speculative Instructions with Renamingwith Renaming

• Extra register is no longer necessary• Result of the boosted instruction is not written into R1

until after branch• Other boosted instructions could use the results of the boosted load

LW R1, 0(R3) ; load ALW+ R1, 0(R2) ;;boosted load BBEQZ R1, L3 ; other branch of the ifADD R1, R1, 4 ; the else clause

written to R1

never written to R1

Hardware-based SpeculationHardware-based SpeculationHardware-based SpeculationHardware-based Speculation

• Hardware-based Speculation– dynamic branch prediction

– speculation to allow the execution of instructions before the control dependencies are resolved

– dynamic scheduling to deal with the scheduling of different combinations of basic blocks

• Advantages– dynamic runtime disambiguation of memory addresses

– hardware-based branch prediction

– a completely precise exception model

– does not require compensation or bookkeeping code

– does not require different code sequences to achieve good performance for different implementation

HW-based SpeculationHW-based SpeculationHW-based SpeculationHW-based Speculation

Need HW buffer for results of uncommitted instructions: reorder buffer

– Reorder buffer can be operand source

– Once operand commits, result is found in register

– 3 fields: instr. type, destination, value

– Use reorder buffer number instead of reservation station

– Instructions commit in order

– As a result, it is easy to undo speculated instructions on mispredicted branches or on exceptions

ReorderBuffer

FP Regs

FP Adder FP Adder

Res Stations Res Stations

From M(LD)

4 4 Steps of Speculative Steps of Speculative Tomasulo AlgorithmTomasulo Algorithm

1. Issue: Get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr &

send operands & reorder buffer no. to the RS

2. Execution: Operate on operands (EX) When both operands ready then execute; if not ready, watch CDB

for result; when both in reservation station, execute

3. Write result: Finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer;

mark reservation station available.

4. Commit: Update register with reorder result When an instruction is at the head of reorder buffer & result

present, update register with result (or store to memory) and remove the instruction from reorder buffer.

Limits to ILPLimits to ILPLimits to ILPLimits to ILPConflicting studies of amount of parallelism available in late 1980s and early 1990s. Different assumptions about:

– Benchmarks (vectorized Fortran FP vs. integer C programs)

– Hardware sophistication

– Compiler sophistication

Limits to ILPLimits to ILPLimits to ILPLimits to ILP

HW Model for ultimate issue performance; MIPS compilers

1. Register renaming: Infinite virtual registers and all WAW & WAR hazards are avoided

2. Branch prediction: Perfect; no mispredictions

3. Jump prediction: All jumps perfectly predicted => machine with perfect speculation and an

unbounded buffer of instructions available

4. Memory-address alias analysis: addresses are known and a store can be moved before a

load provided addresses are not equal

1 cycle latency for all instructions

Upper Limit to ILPUpper Limit to ILPUpper Limit to ILPUpper Limit to ILP

Programs

gcc espresso li fpppp doducd tomcatv

54.862.6

150.1Integer programsFloating point programs

Limitations on Window Size Limitations on Window Size and Maximum Issue Countand Maximum Issue Count

• Window : the set of instructions examined for simultaneous execution

– n instructions: to determine whether they have any register dependencies among them

2n - 2 + 2n - 4 + ..... + 2 = n2-n• 2000 instructions -- 4 million comparisons• 50 instructions -- 2450 comparisons

– current technology : window size - 4 to 32• requires about 900 comparisons

• Multiple Issues -- lengthen the clock cycle

– typically have clock cycles that are 1.5 to 3 times longer

– typically have CPIs that are 2 to 3 times lower

Window Size ImpactWindow Size ImpactWindow Size ImpactWindow Size Impact

60 60 60

1015 12

10 13 11

8 8 914

4 4 4 5 4 63 3 3 3 3 3

gcc espresso li fpppp doduc tomcatv

infinite

Integer Programs FP Programs

More Realistic HW:More Realistic HW: Branch ImpactBranch ImpactMore Realistic HW:More Realistic HW: Branch ImpactBranch Impact

window of 2000 and maximum issue of 64 instructions/clock cycle

Program

6158 60

Perfect Selective predictor Standard 2-bit Static

Perfect Selective predictor Standard 2-bit Static None correlation+ BHT BHT BHT(512) Profile

Branch Prediction

Selective History PredictorSelective History PredictorSelective History PredictorSelective History Predictor8192 x 2 bits

2048 x 4 x 2 bits

Branch Addr

GlobalHistory

00011011

Taken/Not Taken

8K x 2 bitSelector

Choose Non-correlator

Choose Correlator

11 Taken10 ”01 Not Taken00 ”

Non-correlatingpredictor

Correlatingpredictor

More Realistic HW:More Realistic HW: Register ImpactRegister ImpactMore Realistic HW:More Realistic HW:

Register ImpactRegister Impact2000 instr window, 64 instr issue, 8K 2-level Prediction

Program

9 10 11

5 5 6 5 57

Infinite 256 128 64 32 None*

*DLX: 31 Integer Registers/16 FP Registers

No. of renaming Regs

More Realistic HW:More Realistic HW:

Alias ImpactAlias ImpactMore Realistic HW:More Realistic HW:

Alias ImpactAlias Impact2000 instr window,

64 instr issue, 8K 2 level Prediction, 256 renaming registers

Program

45 4 4

53 3 4 4

Perfect Global/stack Perfect + Inspection # None *

* All memory accesses are assumed to conflict+ Ongoing research# Most commercial compilers

Realistic HW for 90s:Realistic HW for 90s: Window ImpactWindow ImpactRealistic HW for 90s:Realistic HW for 90s: Window ImpactWindow Impact

Realistic HW in 90s:

Perfect disambiguation (HW), 1K Selective Prediction, 16 entry return, 64 registers, issue as many as window

Program

gcc expresso li fpppp doducd tomcatv

910 11

6 6 68

4 4 4 5 46

3 2 3 3 3 3

Infinite 256 128 64 32 16 8 4

Fallacies and PitfallsFallacies and PitfallsFallacies and PitfallsFallacies and Pitfalls

Fallacy: Processors with lower CPIs will always be faster.– sophisticated pipelines typically have slower clock rates than

processors with simple pipelines

– example : • IBM Power-2(low CPI) : two FP and two load-store, clock rate 71.5

MHz(slower clock rate)

• Dec Alpha 21604(high CPI) : dual-issue with one load-store and one FP, 200 MHz(faster clock rate)

Braniac vs. Speed DemonBraniac vs. Speed DemonBraniac vs. Speed DemonBraniac vs. Speed Demon

Benchmark

ss sc gcc

6-scalar IBM Power-2 @ 71.5 MHz (5 stage pipe) vs.

2-scalar Alpha @ 200 MHz (7 stage pipe)

Recent High Performance Recent High Performance ProcessorsProcessors

Issue capability SPEC Year Initial (measure shipped in clock rate Issue Schedul- Maxi- Load- Integer or

Processor systems (MHz) structure ing mum store ALU FP Branch estimate)

IBM 1994 67 Dynamic Static 6 2 2 2 2 95 intPower-2 270 FP

Intel 1994 66 Dynamic Static 2 2 2 1 1 65 intPentium 65 FP

DEC Alpha 1995 300 Static Static 4 2 2 2 1 330 int21164 500 FP

Sun Ultra- 1995 167 Dynamic Static 4 1 1 1 1 275 int305 FP

Intel P6 1995 150 Dynamic Dynamic 3 1 2 1 1 >200 int

PowerPC 1995 133 Dynamic Dynamic 4 1 1 1 2 25 int620 300 FP

MIPS 1996 200 Dynamic Dynamic 4 1 2 2 1 300 intR10000 600 FP

HP 8000 1996 200 Dynamic Static 4 2 2 2 1 >360 int>550 FP

Speculative ExecutionCS510 Computer ArchitecturesLecture 11 - 1 Lecture 11 Trace Scheduling,...

Documents

Transcript of Speculative ExecutionCS510 Computer ArchitecturesLecture 11 - 1 Lecture 11 Trace Scheduling,...

Conditional Reasoning

2nd CONDITIONAL - REVISIONostc.splet.arnes.si/files/2020/03/4_2nd-CONDITIONAL...2020/03/04 · 2nd CONDITIONAL 1. Se še spomnite, da pogojnik za sedanjost (2nd conditional) uporabljamo

3. Conditional probability & independence Conditional ...dept.stat.lsa.umich.edu/~ionides/425/notes/conditioning.pdf · 3. Conditional probability & independence Conditional Probabilities

Settrade Derivatives Conditional Order For Investors · 1. ภาพรวมของ Settrade Derivatives Conditional Order Settrade Derivatives Conditional Order เป็นการส

04 conditional

Thema Moderato - Speculative – Speculative – Post ...speculative.hr/wp-content/uploads/pdf/spekulativno_trijenale.pdf · Tekstovi Ivica Mitrović i Marko Golub Uz sudjelovanje

First Conditional - Copia

คำสั่งควบคุม Conditional Statementsweerayuth.in.th/docFiles/04-411-101/04-1-ControlStatement.pdf · คำสั่งควบคุม Conditional Statements

Second Conditional

Testing Conditional Independence using Conditional ...econdept/workshops/Spring_2007_Papers/Song_tci… · Testing Conditional Independence using Conditional Martingale ... Conditional

Conditional Sentences (If–Clause) - TruePlookpanya · 2019-04-26 · Conditional Sentences First Conditional Sentence ใช กับเหตุการณ ที่อาจจะเป

Conditional Formatting

Speculative Return Address Stack Management Revisited

GRAMMATIKWISSEN ENGLISCH...ENGLISCH Conditional Sentences conditional sentences express actions or events which can only take place under certain conditions. Conditional sentences

Poetic speculation share

If Conditional

Kumpulan Tutorial EXCEL - abahvsan.weebly.com · 7 II. Conditional Formatting 2.1. Cara Membuat Conditional Formatting di Excel 2007 Conditional Formatting berguna untuk mengubah

Conditional Order - AIRA · 1 ฝ่ายพัฒนาธุรกิจ มิถุนายน 2562 Conditional Order 1. ภาพรวมของ Settrade Conditional Order

Testing Conditional Independence using Conditional ... · Testing Conditional Independence using Conditional Martingale Transforms Kyungchul Song1 Department of Economics, University

Speculation alimentaire