Post on 31-Jan-2016
description
Power and Frequency Analysis for Data and Control Independence in
Embedded Processors
Farzad Samie Amirali Baniasadi
Sharif University of Technology University of Victoria
This Work
Goal• Power and frequency analysis for control independent and data
independent instructions in embedded processors
Motivation• Embedded processors are becoming complex
• Modern embedded processors use speculation
• Mis-speculation causes performance and power penalty
• Power is a major concern in embedded processors
• Save power and gain performance
2
This Work (cont.)
Our Approach• Reducing wasted energy and time in mispredictions.
How?• Identify and bypass Control Independent (CI) and Data Independent
(DI) instructions.
• CIs: Instruction executing independent of branch outcome.
• CI-DI: CI Instructions executing with the same operands.
Key Result:• 12% processor energy reduction.
3
Background
Branch Prediction
4
Branch Predictor
Branch History
Program Counter
Predicted direction
Predicted target address
Wrong Path (squashed) ??
Background (cont.)
5
I1
I2
I3
I4
I7
I8I9
I5I6
Branch Inst.Not taken
Misprediction Detection
Taken
Right Path
I9
I8
I7
I12
I11
I10
Control Independent Instructions (CIs)
Background (cont.)
6
R1←R1+R2
Not taken Taken
R4←R1
If (R4=0)
R2←R4-R1
R5←R2-R3
R3←0
R5←R4+1
R1←R1-1
R3←0
R4←R6+R4
R1←R4+R1
R5←R5-2R3←R3-R4
Data Independent (CI-DI)Data Dependent (CI-DD)Data Dependent (CI-DD)Data Independent (CI-DI)
R1←R1-1R5←R2-R3
R5←R4+1
CI-DI vs. CI-DD
• Bypassing CI-DIs saves more energy• No need to read operands/execute again
• Bypassing CI-DIs provides higher performance• Not need to waste time for reading operand/executing
7
Fetch Issue Dispatch ExecuteWriteBack
CI-DD
CI-DI
Methodology
• Modified SimpleScalar
• Wattch for power measurement
• MiBench: Embedded Benchmark Suite
8
Distribution
Wrong Path: 12%, CI: 5%, CI-DI: 2%9
CI Power Reduction in Different Units
Max: branch predictor unit, Min: instruction cache
10
CI Power Reduction in Stages
11
Rijndael: low misprediction low wrong path low CIs
Power Sensitivity to RUU size
12
CI CI-DI
Higher power dissipation for bigger RUU sizes
Power Sensitivity to Execution Bandwidth
13
CI CI-DI
Higher power dissipation for wider execution bandwidth
Power Sensitivity to Branch Predictor Size
14Little sensitivity to branch predictor size
Related Work
• Rotenberg et. al: studied control independence in superscalar processors, HPCA99.
• Collins et. al: suggested mechanism to predict re-convergent point, Micro04.
• Lam and Wilson: studied impact of CIs on instruction level parallelism, ISCA92.
• Gandhi et. al: recover selected branch mis-prediction, HPCA04.
15
Conclusion
• Categorize CI to CI-DI and CI-DD
• Potential power saving for bypassing CI and CI-DI instructions up-to 12%
• High sensitivity to RUU size
• High sensitivity to execution bandwidth
• Little sensitivity to branch predictor size
16
Question
Thank you
17