Pipelining and Parallel Processing - National Chiao Tung...
Transcript of Pipelining and Parallel Processing - National Chiao Tung...
VLSI Digital Signal Processing Systems
Pipelining and Parallel Processing
Lan-Da Van (范倫達), Ph. D.
Department of Computer Science
National Chiao Tung University Taiwan, R.O.C.
Fall, 2010
http://www.cs.nctu.edu.tw/~ldvan/
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-2
Outlines
Introduction
Pipelining of FIR Digital Filter
Parallel Processing
Pipelining and Parallel Processing for Low Power
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-3
Introduction
Pipelining
Reduce the critical path
Increase the clock speed or sample speed
Reduce power consumption
Parallel processing
Not reduce the critical path
Not increase clock speed, but increase sample
speed
Reduce power consumption
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-4
A 3-tap FIR Filter
Direct-form structure
)2()1()()( ncxnbxnaxny
AMsample TTT 2AM
sampleTT
f2
1
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-5
Outlines
Introduction
Pipelining of FIR Digital Filter
Parallel Processing
Pipelining and Parallel Processing for Low Power
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-6
Pipelining and Parallel Concept
Pipelining
Introduce pipelining latches
along the datapath
Parallel processing
Duplicate the hardware
TA
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-7
Pipelining FIR Filter
Critical path
2TA+TM-->TA+TM
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-8
Pipelining (1/2)
Drawbacks
Increase number of delay elements (registers/latches) in the
critical path
Increase latency
Clock period limitation: critical path may be between
An input and a latch
A latch and an output
Two Latches
An input and an output
Pipelining latches can only be placed across any
feed-forward cutset of the graph
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-9
Pipelining (2/2)
Cutset: A cutset is a set of edges of a graph such that
if these edges are removed from the graph, the graph
becomes disjoint.
Feed-forward cutset: A cutset is called a feed-forward
cutset if the data move in the forward direction on all
the edges of the cutset.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-10
Example 3.2.1
Error!
2 u.t.
4 u.t.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-11
Transposition Theorem
Reversing the direction of all edges in a given SFG
and interchanging the input and output ports preserve
the functionality of the system.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-12
Data-Broadcast Structure
Critical path is reduced to (TM+TA).
Direct Form II
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-13
Fine-Gain Pipelining
Let TM=10 u.t., TA=2 u.t., and the desired clock
period=6 u.t.
Break the MULTIPLIER into 2 smaller units with
processing time of 6 and 4 units.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-14
Outlines
Introduction
Pipelining of FIR Digital Filter
Parallel Processing
Pipelining and Parallel Processing for Low Power
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-15
Parallel Processing
Parallel processing and pipelining are dual
If a computation can be pipelined, it can also be
processed in parallel.
Convert a single-input single-output (SISO) system to
multiple-input multiple-output (MIMO) system via
parallelism
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-16
Parallel Processing of 3-Tap FIR Filter (1/2)
)3()13()23()23(
)13()3()13()13(
)23()13()3()3(
kcxkbxkaxky
kcxkbxkaxky
kcxkbxkaxky
)2()1()()( ncxnbxnaxny
)2(3
11AMclk
sampleiter
TTTL
TT
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-17
Parallel Processing of 3-Tap FIR Filter (2/2)
How about direct form II?
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-18
Complete Parallel Processing System
Critical path has remained unchanged.
But the iteration period is reduced.
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-19
S/P and P/S Converter
Edge Trigger!
Edge Trigger!
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-20
Why Parallel Processing ?
Parallel leads to duplicating many copies of hardware, and the cost increases! Why use?
Answer lies in the fact that the fundamental limit to pipelining is at I/O bottlenecks, referred to as Communication Bound, composed of I/O pad delay and the wire delay.
Parallel Transmission
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-21
Combined Fine-Grain Pipelining and Parallel Processing
)2(6
11AMclk
sampleiter
TTTLM
TT
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-22
Outlines
Introduction
Pipelining of FIR Digital Filter
Parallel Processing
Pipelining and Parallel Processing for Low Power
Conclusions
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-23
Underlying Low Power Concept
Propagation delay
Power consumption
Sequential filter
fVCP total2
0
seqTf
1,
20
0
)Vk(V
VCT
t
chargeseq
20
0
)Vk(V
VCT
t
chargepd
,20 fVCP totalseq
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-24
Pipelining for Low-Power (1/2)
M-level pipelined system
Critical path-->1/M, capacitance to be charged in a
single clock cycle-->1/M
If the clock frequency is maintained, the power supply
can be reduced to V0 (0<<1)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-25
Pipelining for Low-Power (2/2)
Power consumption
Propagation delay
Let Tseq=Tpip
get )()( 20
20
tt VVVVM
seqtotalpip PβfVβCP 220
2
,2
0
0
)Vk(V
VCT
t
chargeseq
20
0
)VVk(
VM
C
Tt
charge
pip
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-26
Example 3.4.1 (1/2)
Consider an original 3-tap FIR
filter and its fine-grain pipeline
version shown in the following
figures. Assume TM=10 ut, TA=2
ut, Vt=0.6V, Vo=5V, and CM=5CA.
In fine-grain pipeline filter, the
multiplier is broken into 2 parts,
m1 and m2 with computation time
of 6 u.t. and 4 u.t. respectively,
with capacitance 3 times and 2
times that of an adder,
respectively. (a) What is the
supply voltage of the pipelined
filter if the clock period remains
unchanged? (b) What is the
power consumption of the
pipelined filter as a percentage of
the original filter?
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-27
Example 3.4.1 (2/2)
Solution:
%4.36
V0165.3
e)(infeasibl 0239.0or 6033.0
)6.05()6.05(2
3 :GrainFine
6 :Original
2
22
21
Ratio
V
CCCCC
CCCC
pip
AAmmcharge
AAMcharge
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-28
Comparison
System Sequential
FIR (Original)
Pipelined FIR
(Without reducing Vo)
Pipelined FIR
(With reducing Vo)
Power (Ref) PRef 2PRef 0.364PRef
Clock Period
(u.t.)12 ut 6 ut 12 ut
Sample Period
(u.t.)12 ut 6 ut 12 ut
Thinking Again!
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-29
Parallel Processing for Low-Power
L-parallel system
Since maintaining the same sample rate, clock period
is increased to LTseq
This means that Ccharge is charged in LTseq, and the
power supply can be reduced to V0
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-30
Parallel Processing for Low-Power
Power consumption
Propagation delay
LTseq=Tpar
get )()( 20
20 tt VVVVL
seqtotalpar PL
fVLCP 22
0 ))((
,2
0
0
)Vk(V
VCT
t
chargeseq
20
0
)VVk(
VCT
t
charge
par
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-31
Example 3.4.2 (1/2)
Consider a 4-tap FIR filter shown in Fig. 3.18(a) and its 2-
parallel version in 3.18(b). The two architectures are operated at
the sample period 9 u.t. Assume TM=8, TA=1, Vt=0.45V, Vo=3.3V,
CM=8CA (a) What is the supply voltage of the 2-parallel filter? (b)
What is the power consumption of the 2-parallel filter as a
percentage of the original filter?
Solution:
%41.43Ratio
2.1743VVpar
0282.0or 6589.0
)45.03.3(5)45.03.3(9
102 :Parallel2
:Original
2
22
AAMcharge
AMcharge
CCCC
CCC
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-32
Example 3.4.2 (2/2)
x(2k)
x(2k+1)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-33
Example 3.4.3 (1/2)
A more efficient structure than the previous one is depicted in Fig. 3.18(c).
(a) What is the supply voltage of the efficient 2-parallel filter? (b) What is
the power consumption of the efficient 2-parallel filter as a percentage of
the original filter?
Solution:
%6.4335
2
155
Ratio
45857.2
e)(infeasibl 0.025or 0.745
)45.03.3(12)45.03.3(92
124 : Parallel-2 New
9 : Original
20
20
2
22
sA
sA
seq
par
pip
AAMcharge
AAMcharge
fVC
fVC
P
P
VV
CCCC
CCCC
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-34
Example 3.4.3 (2/2)
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-35
Combining Pipelining and Parallel Processing
Parallel-pipelined structure
M=L=2, V0=5V, Vt=0.6V-->=0.4, 2=0.16
20
20 )()( tt VVVVML seq
LTTpp
,2
0
0
)Vk(V
VCT
t
chargeseq
20
0
)VVk(
VM
C
Tt
charge
pp
VLSI Digital Signal Processing Systems
Lan-Da Van VLSI-DSP-3-36
Conclusions
Methodologies of pipelining 3-tap FIR filter
Methodologies of parallel processing for 3-tap FIR
filter
Methodologies of using pipelining and parallel
processing for low power demonstration.
Pipelining and parallel processing of recursive digital
filters using look-ahead techniques are addressed in
Chapter 10.