Pipelining and Parallel Processing - National Chiao Tung...

Post on 02-Feb-2018

227 views 4 download

Transcript of Pipelining and Parallel Processing - National Chiao Tung...

VLSI Digital Signal Processing Systems

Pipelining and Parallel Processing

ldvan@cs.nctu.edu.tw

Lan-Da Van (范倫達), Ph. D.

Department of Computer Science

National Chiao Tung University Taiwan, R.O.C.

Fall, 2010

http://www.cs.nctu.edu.tw/~ldvan/

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-2

Outlines

Introduction

Pipelining of FIR Digital Filter

Parallel Processing

Pipelining and Parallel Processing for Low Power

Conclusions

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-3

Introduction

Pipelining

Reduce the critical path

Increase the clock speed or sample speed

Reduce power consumption

Parallel processing

Not reduce the critical path

Not increase clock speed, but increase sample

speed

Reduce power consumption

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-4

A 3-tap FIR Filter

Direct-form structure

)2()1()()( ncxnbxnaxny

AMsample TTT 2AM

sampleTT

f2

1

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-5

Outlines

Introduction

Pipelining of FIR Digital Filter

Parallel Processing

Pipelining and Parallel Processing for Low Power

Conclusions

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-6

Pipelining and Parallel Concept

Pipelining

Introduce pipelining latches

along the datapath

Parallel processing

Duplicate the hardware

TA

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-7

Pipelining FIR Filter

Critical path

2TA+TM-->TA+TM

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-8

Pipelining (1/2)

Drawbacks

Increase number of delay elements (registers/latches) in the

critical path

Increase latency

Clock period limitation: critical path may be between

An input and a latch

A latch and an output

Two Latches

An input and an output

Pipelining latches can only be placed across any

feed-forward cutset of the graph

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-9

Pipelining (2/2)

Cutset: A cutset is a set of edges of a graph such that

if these edges are removed from the graph, the graph

becomes disjoint.

Feed-forward cutset: A cutset is called a feed-forward

cutset if the data move in the forward direction on all

the edges of the cutset.

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-10

Example 3.2.1

Error!

2 u.t.

4 u.t.

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-11

Transposition Theorem

Reversing the direction of all edges in a given SFG

and interchanging the input and output ports preserve

the functionality of the system.

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-12

Data-Broadcast Structure

Critical path is reduced to (TM+TA).

Direct Form II

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-13

Fine-Gain Pipelining

Let TM=10 u.t., TA=2 u.t., and the desired clock

period=6 u.t.

Break the MULTIPLIER into 2 smaller units with

processing time of 6 and 4 units.

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-14

Outlines

Introduction

Pipelining of FIR Digital Filter

Parallel Processing

Pipelining and Parallel Processing for Low Power

Conclusions

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-15

Parallel Processing

Parallel processing and pipelining are dual

If a computation can be pipelined, it can also be

processed in parallel.

Convert a single-input single-output (SISO) system to

multiple-input multiple-output (MIMO) system via

parallelism

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-16

Parallel Processing of 3-Tap FIR Filter (1/2)

)3()13()23()23(

)13()3()13()13(

)23()13()3()3(

kcxkbxkaxky

kcxkbxkaxky

kcxkbxkaxky

)2()1()()( ncxnbxnaxny

)2(3

11AMclk

sampleiter

TTTL

TT

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-17

Parallel Processing of 3-Tap FIR Filter (2/2)

How about direct form II?

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-18

Complete Parallel Processing System

Critical path has remained unchanged.

But the iteration period is reduced.

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-19

S/P and P/S Converter

Edge Trigger!

Edge Trigger!

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-20

Why Parallel Processing ?

Parallel leads to duplicating many copies of hardware, and the cost increases! Why use?

Answer lies in the fact that the fundamental limit to pipelining is at I/O bottlenecks, referred to as Communication Bound, composed of I/O pad delay and the wire delay.

Parallel Transmission

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-21

Combined Fine-Grain Pipelining and Parallel Processing

)2(6

11AMclk

sampleiter

TTTLM

TT

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-22

Outlines

Introduction

Pipelining of FIR Digital Filter

Parallel Processing

Pipelining and Parallel Processing for Low Power

Conclusions

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-23

Underlying Low Power Concept

Propagation delay

Power consumption

Sequential filter

fVCP total2

0

seqTf

1,

20

0

)Vk(V

VCT

t

chargeseq

20

0

)Vk(V

VCT

t

chargepd

,20 fVCP totalseq

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-24

Pipelining for Low-Power (1/2)

M-level pipelined system

Critical path-->1/M, capacitance to be charged in a

single clock cycle-->1/M

If the clock frequency is maintained, the power supply

can be reduced to V0 (0<<1)

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-25

Pipelining for Low-Power (2/2)

Power consumption

Propagation delay

Let Tseq=Tpip

get )()( 20

20

tt VVVVM

seqtotalpip PβfVβCP 220

2

,2

0

0

)Vk(V

VCT

t

chargeseq

20

0

)VVk(

VM

C

Tt

charge

pip

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-26

Example 3.4.1 (1/2)

Consider an original 3-tap FIR

filter and its fine-grain pipeline

version shown in the following

figures. Assume TM=10 ut, TA=2

ut, Vt=0.6V, Vo=5V, and CM=5CA.

In fine-grain pipeline filter, the

multiplier is broken into 2 parts,

m1 and m2 with computation time

of 6 u.t. and 4 u.t. respectively,

with capacitance 3 times and 2

times that of an adder,

respectively. (a) What is the

supply voltage of the pipelined

filter if the clock period remains

unchanged? (b) What is the

power consumption of the

pipelined filter as a percentage of

the original filter?

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-27

Example 3.4.1 (2/2)

Solution:

%4.36

V0165.3

e)(infeasibl 0239.0or 6033.0

)6.05()6.05(2

3 :GrainFine

6 :Original

2

22

21

Ratio

V

CCCCC

CCCC

pip

AAmmcharge

AAMcharge

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-28

Comparison

System Sequential

FIR (Original)

Pipelined FIR

(Without reducing Vo)

Pipelined FIR

(With reducing Vo)

Power (Ref) PRef 2PRef 0.364PRef

Clock Period

(u.t.)12 ut 6 ut 12 ut

Sample Period

(u.t.)12 ut 6 ut 12 ut

Thinking Again!

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-29

Parallel Processing for Low-Power

L-parallel system

Since maintaining the same sample rate, clock period

is increased to LTseq

This means that Ccharge is charged in LTseq, and the

power supply can be reduced to V0

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-30

Parallel Processing for Low-Power

Power consumption

Propagation delay

LTseq=Tpar

get )()( 20

20 tt VVVVL

seqtotalpar PL

fVLCP 22

0 ))((

,2

0

0

)Vk(V

VCT

t

chargeseq

20

0

)VVk(

VCT

t

charge

par

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-31

Example 3.4.2 (1/2)

Consider a 4-tap FIR filter shown in Fig. 3.18(a) and its 2-

parallel version in 3.18(b). The two architectures are operated at

the sample period 9 u.t. Assume TM=8, TA=1, Vt=0.45V, Vo=3.3V,

CM=8CA (a) What is the supply voltage of the 2-parallel filter? (b)

What is the power consumption of the 2-parallel filter as a

percentage of the original filter?

Solution:

%41.43Ratio

2.1743VVpar

0282.0or 6589.0

)45.03.3(5)45.03.3(9

102 :Parallel2

:Original

2

22

AAMcharge

AMcharge

CCCC

CCC

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-32

Example 3.4.2 (2/2)

x(2k)

x(2k+1)

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-33

Example 3.4.3 (1/2)

A more efficient structure than the previous one is depicted in Fig. 3.18(c).

(a) What is the supply voltage of the efficient 2-parallel filter? (b) What is

the power consumption of the efficient 2-parallel filter as a percentage of

the original filter?

Solution:

%6.4335

2

155

Ratio

45857.2

e)(infeasibl 0.025or 0.745

)45.03.3(12)45.03.3(92

124 : Parallel-2 New

9 : Original

20

20

2

22

sA

sA

seq

par

pip

AAMcharge

AAMcharge

fVC

fVC

P

P

VV

CCCC

CCCC

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-34

Example 3.4.3 (2/2)

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-35

Combining Pipelining and Parallel Processing

Parallel-pipelined structure

M=L=2, V0=5V, Vt=0.6V-->=0.4, 2=0.16

20

20 )()( tt VVVVML seq

LTTpp

,2

0

0

)Vk(V

VCT

t

chargeseq

20

0

)VVk(

VM

C

Tt

charge

pp

VLSI Digital Signal Processing Systems

Lan-Da Van VLSI-DSP-3-36

Conclusions

Methodologies of pipelining 3-tap FIR filter

Methodologies of parallel processing for 3-tap FIR

filter

Methodologies of using pipelining and parallel

processing for low power demonstration.

Pipelining and parallel processing of recursive digital

filters using look-ahead techniques are addressed in

Chapter 10.