VLSIアーキテクチャ(1)sakai/vlsi/vlsi1.pdfVLSIアーキテクチャ VLSIアーキテクチャ入門 内容 –VLSIとはなにか –VLSIアーキテクチャの要件 •機能
VLSI Digital Signal Processing Chapter 6 Folding
Transcript of VLSI Digital Signal Processing Chapter 6 Folding
![Page 1: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/1.jpg)
VLSI Digital Signal Processing Systems
Folding
Lan-Da Van (范倫達), Ph. D.
Department of Computer Science
National Chiao Tung University
Taiwan, R.O.C.
Fall, 2015
http://www.cs.nctu.tw/~ldvan/
![Page 2: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/2.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-2
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
![Page 3: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/3.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-3
Introduction (1/2)
Systematically determine the control circuits in DSP
architectures by folding transformation, where
multiple algorithm operations are time-multiplexed to
a single functional unit.
Use for synthesis of DSP architectures that can be
operated at single or multiple clocks.
Use to reduce the number of hardware functional
units (FUs) by a factor of N at the expense of
increasing computation time by a factor of N.
Lead to an architecture that uses a large number of
registers and thus present the register minimization
technique.
![Page 4: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/4.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-4
Introduction (2/2)
![Page 5: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/5.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-5
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
![Page 6: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/6.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-6
Folding Transformation (1/3)
A systematic techniques for designing control circuits for hardware where several algorithm operations are time-multiplexed on a single functional unit.
Notations U, V: nodes (operations) of the original DFG
HU, HV: nodes (functional units) of the folded DFG
W(x): x-th iteration of node W
U → V: an edge e from node U to noe V
w(e): # of delays of the edge e
Folding factor N
# of operations that share one FU
Folding set An ordered set of operations that executed by the same FU
the position of an operation U in folding set is actually the folding order of U
The folding set are typically obtained from a scheduling and allocation algorithm (ref. Appendix B)
The folding set represents underlying folding transformation
e
![Page 7: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/7.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-7
Folding Transformation (2/3)
PU: # of the pipeline stages of HU. PU = 0 indicates
that HU is not pipelined.
DF(U → V): (folding equation) # of cycles that the
result of HU must be stored
e
Negative value of folding equation DF is possible
before retiming the folding equations.
e
uvPeNw
uPNlvewlNVUD
U
UF
)(
][]))](([)(
![Page 8: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/8.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-8
Folding Transformation (3/3)
U(l) w(e)
V(l+w(e))
HU(Nl+u)
PU+DF HV
(N(l+w(e))+v)
N folded N folded
![Page 9: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/9.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-9
Folding Retimed Biquad Filter (1/2)
Folding factor N = 4
Folding set S1 = {4, 2, 3, 1}, S2 = {5, 8, 6, 7}, where S1 denote all add operation and S2 denote all multiply operation.
Assume that addition and multiplication require 1 and 2 u.t. respectively.
1-stage adders and 2-stage pipelined multipliers are available.
![Page 10: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/10.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-10
Folding Retimed Biquad Filter (2/2)
folding equations
![Page 11: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/11.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-11
Retiming (1/3)
What situations will be suffered if the folding equation
DF is negative?
Retiming (moving delay elements) the original DFG
prior to folding
Constraint:
D’F(U→V)= Nwr(e)–PU +v–u>=0 -----(1)
Substitute wr(e)=w(e)+r(V)–r(U) into (1)
r(U)–r(V)<= DF(U→V)/N
Since the retiming values of the nodes are restricted to be
integers, the above equations can be rewritten as
r(U)–r(V)<=└DF(U→V)/N┘ e
e
e
![Page 12: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/12.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-12
Retiming (2/3)
Example: DF(12)=Nw(e)-PU+v-
u=0-1+1-3=-3
r(1)-r(2)<= floor{DF(12)/N}
=floor{-3/4}=-1
![Page 13: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/13.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-13
Retiming (3/3)
r(1)=-1, r(2)=0, r(3)=-1, r(4)=0
r(5)=-1, r(6)=-1, r(7)=-2, r(8)=-1
![Page 14: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/14.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-14
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
![Page 15: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/15.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-15
Lifetime Analysis
Lifetime analysis is a procedure used to compute the
minimum number of registers required to implement a
DSP algorithm in hardware.
Linear lifetimes analysis
Circular lifetime analysis
In lifetime analysis, the number of live variables at
each time unit is computed, and the maximum
number of live variables at any time unit is
determined.
Forward-backward register allocation technique
![Page 16: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/16.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-16
Linear Lifetime Analysis
Variables {a , b , c}
max {0,1,2,2,2,2,2,2}=2
Three iterations with N=6
Periodicity Implicit
![Page 17: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/17.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-17
Matrix Transpose Example (1/3)
a d g
b e h
c f i
a b c
d e f
g h i
i h g f e d c b a Matrix
Transpose i f c h e b g d a
Transpose
![Page 18: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/18.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-18
Matrix Transpose Example (2/3)
Tzlout = zero-lantacy output time
Tdiff = Tzlout – Tinput
Toutput = Tzlout + max{-Tdiff}
![Page 19: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/19.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-19
Matrix Transpose Example (3/3)
The minimum register number is 4.
Linear Lifetime Chart Circular Lifetime Chart
![Page 20: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/20.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-20
Procedures of Forward-Backward Register Allocation
Steps:
Step 1: Determinate the minimum number of registers using lifetime analysis.
Step 2: Input each variable at time step according to the beginning of its lifetime.
Step 3: Each variable is allocated in a forward manner until it is dead or it reaches the last register.
Step 4: Since the allocation is periodic, the allocation of the current iteration also repeats itself in subsequent iterations. Thus, we hash the position for registers at period of N.
Step 5: If a variable that reaches the last register and is still alive, then these variables are allocated to a register in a backwardly manner.
Step 6: Repeat Steps 4 and 5 as required until the allocation is completed.
![Page 21: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/21.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-21
Register Allocation for Matrix Transpose Example
![Page 22: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/22.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-22
Outline
Introduction
Folding Transformation
Register Minimization Techniques
Register Minimization in Folded Architecture
Conclusions
![Page 23: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/23.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-23
Procedures of Register Minimization in Folded Architectures
Steps:
Step 1: Perform retiming for folding
Step 2: Write the folding equations
Step 3: Use the folding equations to construct a lifetime table
Step 4: Draw the lifetime chart and determine the required number of registers
Step 5: Perform forward-backward register allocation
Step 6: Draw the folded architecture that uses the minimum number of registers
![Page 24: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/24.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-24
Folding Architecture Example
![Page 25: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/25.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-25
Folded Architecture for Matrix Transpose Example
![Page 26: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/26.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-26
Biquad Filter Example (1/4)
Retiming
Invalid folding:
DF(1→2) = -3
DF(6→4) = -4
DF(8→4) = -3
DF(7→3) = -3
Step 1: Retiming
![Page 27: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/27.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-27
Biquad Filter Example (2/4)
Step 2: Folding Equations
DF(U→V) = Nw(e) – Pu + v - u
DF(1→2) = 4(1) – 1 + 1 – 3 = 1
DF(1→5) = 4(1) – 1 + 0 – 3 = 0
DF(1→6) = 4(1) – 1 + 2 – 3 = 2
DF(1→7) = 4(1) – 1 + 3 – 3 = 3
DF(1→8) = 4(2) – 1 + 1 – 3 = 5
DF(3→1) = 4(0) – 1 + 3 – 2 = 0
DF(4→2) = 4(0) – 1 + 1 – 0 = 0
DF(5→3) = 4(0) – 2 + 2 – 0 = 0
DF(6→4) = 4(1) – 2 + 0 – 2 = 4
DF(7→3) = 4(1) – 2 + 2 – 3 = 1
DF(8→4) = 4(1) – 2 + 0 – 1 = 1
Step 3: Construct the lifetime table
Tinput = u + Pu
Toutput = u + Pu + maxv{DF(U→V) }
![Page 28: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/28.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-28
Biquad Filter Example (3/4)
Step 4: Draw the Lifetime Chart
The minimum number
of registers is 2.
Step 5: Register Allocation
Folding Factor = 4
![Page 29: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/29.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-29
Biquad Filter Example (4/4)
Step 6: Folded Architecture
![Page 30: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/30.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-30
IIR Filter Example (1/4)
Step 1: Retiming
Retiming
Invalid folding:
DF(31) = -3
DF(41) = -2
![Page 31: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/31.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-31
IIR Filter Example (2/4)
Step 2: Folding Equations
DF(U→V) = Nw(e) – Pu + v - u
DF(1→2) = 4(1) – 1 + 1 – 3 = 0
DF(2→3) = 4(1) – 1 + 0 – 3 = 5
DF(2→4) = 4(1) – 1 + 2 – 3 = 2
DF(3→1) = 4(1) – 1 + 3 – 3 = 1
DF(4→1) = 4(2) – 1 + 1 – 3 = 0
Step 3: Construct the lifetime table
Tinput = u + Pu
Toutput = u + Pu + maxv{DF(U→V) }
![Page 32: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/32.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-32
IIR Filter Example (3/4)
Step 4: Draw the Lifetime Chart Step 5: Register Allocation
The minimum number
of registers is 3.
Folding Factor = 2
![Page 33: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/33.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-33
IIR Filter Example (4/4)
Step 6: Folded Architecture
![Page 34: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/34.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-34
Conclusions
Present a systematic transformation of time-
multiplexed architectures
Explore folding techniques to reduce # of functional
units
Explore register minimization technique to reduce #
of registers
![Page 35: VLSI Digital Signal Processing Chapter 6 Folding](https://reader034.fdocument.pub/reader034/viewer/2022042502/586ccc851a28abf6088be096/html5/thumbnails/35.jpg)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-6-35
References
K. K. Parhi, VLSI Digital Signal Processing Systems:
Design and Implementation, Wiley, 1999.
S. Y. Huang, Handout of text book, 2004.