Ch6 DT Interconnect

52
Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 6 Optimizing Power @ Design Time Interconnect and Clocks

Transcript of Ch6 DT Interconnect

Page 1: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 1/52

Jan M. Rabaey

Low Power Design Essentials ©2008 Chapter 6

Optimizing Power @ Design Time

Interconnect and Clocks

Page 2: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 2/52

 Low Power Design Essentials ©2008 6.2

Chapter Outline

Trends and bounds

An OSI approach to interconnect optimization

 – Physical layer – Data link and MAC

 – Network

 – Application

Clock distribution

Page 3: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 3/52

 Low Power Design Essentials ©2008 6.3

ITRS Projections

Calendar Year 2012  2018  2020 

Interconnect One Half Pitch 35 nm 18 nm 14 nm

MOSFET Physical Gate Length 14 nm 7 nm 6 nm

Number of Interconnect Levels 12-16 14-18 14-18

On-Chip Local Clock 20 GHz 53 GHz 73 GHz

Chip-to-Board Clock 15 GHz 56 GHz 89 GHz

# of Hi Perf. ASIC Signal I/O Pads 2500 3100 3100

# of Hi Perf. ASIC Power/Ground Pads 2500 3100 3100

Supply Voltage 0.7-0.9 V 0.5-0.7 V 0.5-0.7 VSupply Current 283-220 A 396-283 A 396-283 A

[Source: ITRS Roadmap, 2004, 2005]

Page 4: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 4/52

 Low Power Design Essentials ©2008 6.4

Increasing Impact of Interconnect

Interconnect is now exceeding transistors in – Latency

 – Power dissipation

 – Manufacturing complexity Direct consequence of scaling

Page 5: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 5/52

 Low Power Design Essentials ©2008 6.5

Communication Dominant Part of Power Budget

65% 

21% 

9% 5%  Interconnect 

Clock 

I/O 

CLB 

FPGAmProcessor 

Signal processor 

Clock 

Logic Memory 

I/O 

Clocks 

Caches 

ExecutionUnits 

Control  I/O Drivers 

40% 20% 

15% 

15% 10% 

Page 6: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 6/52

 Low Power Design Essentials ©2008 6.6

Idealized Wire Scaling Model

Parameter Relation Local Wire Constant Length Global Wire

W, H, t  1/ S  1/ S  1/ S 

L 1/ S  1 1/ S C 

C LW/t  1/ S  1 1/ S C 

R L / WH S S 2 S 2 /SC

t p ~ CR L2 / Ht  1 S 2 S 2 /SC2

E CV 2 1/ SU 2 1/ U 2 1/(S C U 2)

Page 7: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 7/52 Low Power Design Essentials ©2008 6.7

Distribution of Wire Lengths on Chip

[Ref: J. Davis, C&S’98] 

 © IEEE 1998

Page 8: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 8/52 Low Power Design Essentials ©2008 6.8

Technology Innovations

Reduce dielectricpermittivity

(e.g. Aerogels or air)

Reduce resistivity(e.g. Copper)

Reduce wirelengths through3D-integration

Novel interconnectmedia (carbonnanotubes, optical)

(Pictures courtesy of IBM and IFC FCRP)

 © IEEE 1998

Page 9: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 9/52 Low Power Design Essentials ©2008 6.9

Logic Scaling

10-12

10-9

10-6

10-3

100

Pt p ~ 1/S 3 

100

10-3

10-6

10 -9

10-12

10-15

   P  o  w  e  r   [   W   ] ,   P

Delay [s], t p  

10-6J

10-9J

10-12J

10-15J

10-18J

[Ref: J. Davis, Proc’01] 

Page 10: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 10/52

Page 11: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 11/52 Low Power Design Essentials ©2008 6.11

Lower Bounds on Interconnect Energy

Claude Shannon

)1(2logkTB

P BC 

S

C: capacity in bits/secB: bandwidthP s:  average signal power

C P E Sbit / 

Valid for an ―infinitely long‖ bit transition (C/B→0) Equals 4.10-21J/bit at room temperature

)2ln()0 / ((min) kT  BC  E  E  bit bit 

Shannon’s theorem on maximum capacity of 

communication channel

[Ref: J. Davis, Proc’01] 

Page 12: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 12/52 Low Power Design Essentials ©2008 6.12

Reducing Interconnect Power/Energy

Same philosophy as with logic: reduce capacitance,voltage (or voltage swing) and/or activity

A major difference: sending a bit(s) from one point toanother is fundamentally a communications/networking problem, and it helps to consider it as

such.

Abstraction layers are different:

 – For computation: device, gate, logic, micro-architecture

 – For communication: wire, link, network, transport

Helps to organize along abstraction layers, wellunderstood in the networking world: the OSI protocolstack

Page 13: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 13/52 Low Power Design Essentials ©2008 6.13

OSI Protocol Stack

Reference model for wiredand wireless protocol design— Also useful guide forconception and optimizationof on-chip communication

Layered approach allows fororthogonalization of concernsand decomposition ofconstraints

Network

Transport

Session

Data Link

Physical

Presentation/Application

No requirement to implement all layers of the stack

Layered structure must not necessarily be maintained infinal implementation

[Ref: M. Sgroi, DAC’01] 

Page 14: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 14/52 Low Power Design Essentials ©2008 6.14

The Physical Layer

Transmit bits over

physical interconnectmedium (wire) 

Physical medium

 – Material choice, repeaterinsertion

Signal waveform

 – Discrete levels, pulses,modulated sinusoids

Voltages

 – Reduced swing

Timing, synchronization

Network

Transport

Session

Data Link

Physical

Presentation/Application

So far, on-chip communication almost uniquely “level-based” 

Page 15: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 15/52 Low Power Design Essentials ©2008 6.15

Repeater Insertion

Optimal receiver insertion results in wire delay linear with L

))(( wwd d  p cr C  R Lt 

with R d C d  and r w c w  intrinsic delays of inverter and wire, respectively 

But: At major energy cost! 

Page 16: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 16/52 Low Power Design Essentials ©2008 6.16

Repeater Insertion ─ Example 

1 cm Cu wire in 90 nm technology (on

intermediate layers)

 – r w = 250 W /mm; c w = 200 fF/mm

 – t p = 0.69r w c w L2 = 3.45 nsec

Optimal driver insertion: – t popt  = 0.5 nsec

 – Requires insertion of 13 repeaters

 – Energy per transition 8 times larger than just charging

the wire (6 pJ verus 0.75 pJ)!

It pays to back off!

Page 17: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 17/52 Low Power Design Essentials ©2008 6.17

Wire Energy-Delay Trade-off

1  2  3  4  5  6  7  8 0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

dNorm  

    e     N    o    r    m

wire energy only

L = 1cm (Cu)90 nm CMOS

(dMin, eMax )

   R  e  p  e  a   t  e  r  o  v  e

  r   h  e  a   d

Page 18: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 18/52 Low Power Design Essentials ©2008 6.18

Multi-dimensional Optimization

Design parameters:

Voltage, number ofstages, buffer sizes

Voltage scaling has

largest impact, followedby selection of numberof repeaters

Transistor sizing

secondary.

1  2  3  4  5  6  7  8 

10 

12 

dNorm  

   N  u  m   b  e  r  o   f

  s   t  a  g  e  s

0.5 

0.6 

0.7 

0.8 

0.9 

1.1 

1.2 

     V     D     D

   (   V   )

Page 19: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 19/52 Low Power Design Essentials ©2008 6.19

Reduced Swing

E bit = CV DD V swing 

Concerns:

 – Overhead (area, delay) – Robustness (supply noise, crosstalk, process variations)

 – Repeaters?

Transmitter (TX) Receiver (RX)

Page 20: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 20/52

 Low Power Design Essentials ©2008 6.20

Traditional Level Converter

Requires two discrete voltage levels Asynchronous level conversion adds extra

delay

VDDH VDDL

VDDH VDDH

in

CL

OUT OUT

VDDL

[Ref: H. Zhang, TVLSI’00] 

Page 21: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 21/52

 Low Power Design Essentials ©2008 6.21

Avoiding Extra References

[Ref: H. Zhang, VLSI’00] 

in

VDD

VDD VDD

in2

CL

outN3

P3

N1

P1

N2

P2

VTC

Transient

Page 22: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 22/52

 Low Power Design Essentials ©2008 6.22

Differential (Clocked) Signaling

Allows for very low swings (200 mV)

Robust Quadratic energy savings

But: doubling the wiring, extra clock signal, complexity

[Ref: T. Burd, UCB’01] 

in

REF VDD

REF

CL

CL

clk

clk clk

d_b d

out_b

out

Page 23: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 23/52

 Low Power Design Essentials ©2008 6.23

Lower Bound on Signal Swing?

Reduction of signal swing translates into higher power dissipation inreceiver – trade-off between wire and receiver energy dissipation

Reduced SNR impacts reliability – current on-chip interconnectstrategies require Bit Error Rate (BER) of zero (in contrast tocommunication and network links)

 – Noise source: power supply noise, crosstalk

Swings as low as 200 mV have been reported [Ref: Burd’00], 100

mV definitely possible Further reduction requires crosstalk suppression

shielding folding

GND

GND

GND

Q i Adi b i Ch i

Page 24: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 24/52

 Low Power Design Essentials ©2008 6.24

Quasi-Adiabatic Charging

V  V DD 

V DD  / N 

[Ref: L. Svensson, ISLPED’96] 

• Uses stepwise approximation ofadiabatic (dis)charging• Capacitors acting as ―charge

reservoir‖ 

• Energy drawn from supply reduced

by factor N 

CT1

CT2

CTN-1

Ch R di ib i S h

Page 25: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 25/52

 Low Power Design Essentials ©2008 6.25

Charge Redistribution Schemes

V DD  / 2 

V DD  / 4 

3V DD  / 4 

Precharge Eval Precharge

B 0

B 0

B 1

B 1

B 0 = 0

B 1 = 1

V DD 

E

E

E

P

P

GND  

RX1

RX0

1

0

B1

B1

B0

B0

Charge recycled from top to bottom Precharge phase equalizes differential lines

Energy/bit = 2C (V DD  / N )2

Challenges: Receiver design, noise margins

[Ref: H. Yamauchi, JSSC’95] 

Al i C i i S h

Page 26: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 26/52

 Low Power Design Essentials ©2008 6.26

Alternative Communication Schemes

Example: Capacitively-driven wires

Offers some compelling advantages Reduced swing

Swing is V DD  /(n+1) without extra

supply Reduced load

Allows for smaller driver

Reduced delayCapacitor pre-emphasizes edges Pitchfork capacitors exploit 

sidewall capacitance [Ref: D. Hopkins, ISSCC’07] 

Si li P l

Page 27: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 27/52

 Low Power Design Essentials ©2008 6.27

Signaling Protocols

Network

ProcessorModule

(mProc, ALU, MPY, SRAM…)

 

din reqin ackindout reqout ackout

Din

REQin

done

GloballyAsynchronous

self-timed handshakingprotocol

Allows individual modulesto dynamically

trade-off performancefor energy-efficiency

Si li P l

Page 28: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 28/52

 Low Power Design Essentials ©2008 6.28

Signaling Protocols

Network

Physical LayerInterface Module

ProcessorModule

(mProc, ALU, MPY, SRAM…) 

din reqin ackindout reqout ackout

din dout clk 

Din

REQin

Clk

done

Locallysynchronous

done

Globally Asynchronous 

Th D t Li k /M di A L

Page 29: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 29/52

 Low Power Design Essentials ©2008 6.29

The Data Link /Media Access Layer

Reliable transmission over 

physical link and sharinginterconnect medium

between multiple sources

and destinations (MAC) 

Bundling, serialization,

packetizing

Error detection and correction

Coding

Multiple-access schemes

NetworkTransport

Session

Data Link

Physical

Presentation/Application

C di

Page 30: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 30/52

 Low Power Design Essentials ©2008 6.30

Coding

   E  n  c  o   d  e  r

   D  e  c  o   d  e  r

N N + k N 

LinkTX RX

Adding redundancy to communication link (extra bits) to: Reduce transitions (activity encoding) Reduce energy/bit (error-correcting coding)

A ti it R d ti Th h C di

Page 31: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 31/52

 Low Power Design Essentials ©2008 6.31

Activity Reduction Through Coding

[Ref: M. Stan, TVLSI’95] 

   E  n  c  o   d  e  r

   D  e  c  o   d  e  r

N N + 1

Example: Bus-Invert Coding

Invert bit p 

Data word D inverted if Hamming distance from previous is larger than N  /2.

D Denc 

D  # T Denc p #T 

00101010

00111011110101000000110101110110… 

-

2756

00101010

00111011001010110000110110001001… 

0

0101

-

21+13+12+1

B I t C di

Page 32: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 32/52

 Low Power Design Essentials ©2008 6.32

Bus-Invert Coding

Gain: 25 % (at best – for random data)

Overhead: Extra wire (and activity)

Encoder, decoderNot effective for correlated data

   R  e  g

   LP

Encode

Decode

D  D Denc 

[Ref: M. Stan, TVLSI’95] 

Bus

Other Transition Coding Schemes

Page 33: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 33/52

 Low Power Design Essentials ©2008 6.33

Other Transition Coding Schemes

Advanced bus-invert coding (e.g. partition bus into sub-components)(e.g. [M.Stan, TVLSI’97]) 

Coding for address busses ( which often display sequentiality)(e.g. [L. Benini, DATE’98]) 

Full-fledged channel coding, borrowed from communication links(e.g. [S. Ramprasad, TVLSI’99]) 

Coding to reduce impactof Miller capacitancebetween neighboringwires[Ref: Sotiriadis, ASPDAC’01] 

Maximum capacitancetransition – can beavoided by coding

bit k-1 bit k bit k+1 Delay factor g 

h h h 1

h h − 1 + r 

h h i 1 + 2r 

−  h − 1 + 2r 

−  h i 1 + 3r 

i h i 1 + 4r 

Error Correcting Codes

Page 34: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 34/52

 Low Power Design Essentials ©2008 6.34

Error-Correcting Codes

   E  n  c  o   d  e

  r

   D  e  c  o   d  e

  r

N + k 

N D 

Denc D 

with

e.g.

1

1

0

= 3

Example: (4,3,1) Hamming Code

B 3wrong Adding redundancy allows

for more aggressive scaling ofsignal swings and/or timing

Simpler codes such asHamming prove most effective

P 1P 2B 3P 4B 5B 6B 7

P 1

+ B 3

+ B 5

+ B 7

= 0

P 4 + B 5 + B 5 + B 7 = 0

P 2 + B 3 + B 6 + B 7 = 0

Media Access

Page 35: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 35/52

 Low Power Design Essentials ©2008 6.35

Media Access

Sharing of physical media over multiple data streamsincreases capacitance and activity (see Chapter 5), but

reduces area

Many multi-access schemes known from communications

 – Time domain:Time-Division Multiple Access (TDMA)

 – Frequency domain: narrow band, code division multiplexing

Buses based on Arbitration-based TDMA most commonin today’s ICs 

Bus Protocols and Energy

Page 36: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 36/52

 Low Power Design Essentials ©2008 6.36

Bus Protocols and Energy

Some Lessons from the Communications world:

 – When utilization is low, simple schemes are more effective – When traffic is intense, reservation of resources minimizes

overhead and latency (collisions, resends)

Combining the two leads to energy efficiency

Example : SiliconBackplane MicroNetwork

CurrentSlot

[Courtesy: Sonics, Inc]

Independent arbitration for every cycle includes two phases:- Distributed TDMA for guaranteed latency/bandwidth- Round robin for random access

Arbitration

Command

The Network Layer

Page 37: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 37/52

 Low Power Design Essentials ©2008 6.37

The Network Layer

Topology-independentend-to-end communicationover multiple data links(routing, bridging,repeaters)

Topology

Static versus dynamicconfiguration / routing

Physical

Transport

Session

Data Link

Network

Presentation/Application

Becoming more important in today’s complex multi-processor designs“The Network-on-a-Chip (NOC)” 

[Ref: G. De Micheli, Morgan-Kaufman’06]

Network on a Chip (NoC)

Page 38: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 38/52

 Low Power Design Essentials ©2008 6.38

Network-on-a-Chip (NoC)

Dedicated networks with reserved links preferable forhigh traffic channels – but: limited connectivity, areaoverhead

Flexibility an increasing requirement in multi (many) –core chip implementations

or

The Network Trade off’s

Page 39: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 39/52

 Low Power Design Essentials ©2008 6.39

The Network Trade-off s 

Interconnect-oriented architecture trades off flexibility, latency,energy and area-efficiency through the following concepts

Locality - eliminate global structures

Hierarchy - expose locality in communication requirements

Concurrency/Multiplexing

Very Similar to Architectural Space Trade-off’s 

Dedicated wiring

Proc

LocalLogic

Router

NetworkWires

Network-on-a-Chip

[Courtesy: B. Dally, Stanford]

Networking Topology

Page 40: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 40/52

 Low Power Design Essentials ©2008 6.40

Networking Topology

Homogeneous – Crossbar, Butterfly, Torus,Mesh,Tree, … 

Heterogeneous

 – Hierarchy

Mesh (FPGA)

Tree

Crossbar

Network Topology Exploration

Page 41: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 41/52

 Low Power Design Essentials ©2008 6.41

Network Topology Exploration

Manhattan Distance

   E  n  e  r  g  y  x   D  e   l  a  y

Mesh 

Binary Tree 

Manhattan Distance

   E  n  e  r  g  y  x   D  e   l  a

  y

Mesh 

Binary Tree 

Mesh + Inverse 

Short connections in tree are redundant

Inverse clustering complements mesh

[Ref: V. George, Springer’01] 

Circuit Switched versus Packet Based

Page 42: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 42/52

 Low Power Design Essentials ©2008 6.42

Circuit-Switched versus Packet Based

On-Chip Reality: Wires (bandwidth) are

relatively cheap, buffering and routingexpensive

Packet-switched approach versatile

 – Preferred approach in large networks

 – But … routers come with large overhead 

 – Case study Intel: 18% of power in link, 82%in router

Circuit-switched approach attractive forhigh-data rate quasi-static links

Hierarchical combination often preferredchoice

Bus

C C

C C

Bus to connect overshort distances

Hierarchical circuit and packetswitched networks for longerconnections

Bus

C C

C C

Bus

C C

C C

Bus

C C

C C

Bus

C C

C C

R R

R R

Example: The Pleiades Network on a Chip

Page 43: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 43/52

 Low Power Design Essentials ©2008 6.43

Example: The Pleiades Network-on-a-Chip

Configuration Bus

•Configurable platform for

low-energy communicationand signal-processingapplications(See Chapter 5)• Allows for dynamic task-

level reconfiguration ofprocess network

Energy-efficient flexible networkessential to the concept

Configurable Interconnect

ArithmeticModule

ArithmeticModule

ArithmeticModule

ConfigurableLogic

ConfigurableLogicmP

Configuration

Dedicated

Arithmetic

Network Interface

[Ref: H. Zhang, JSSC’00] 

Pleiades Network Layer

Page 44: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 44/52

 Low Power Design Essentials ©2008 6.44

Pleiades Network Layer

Universal Switchbox

Cluster

Cluster

Level-1 Mesh Level-2 Mesh

Hierarchical Switchbox

• Network statically configured at start of session and ripped up at end• Structured approach reduces interconnect energy with factor 7over straightforward cross-bar 

Hierarchical reconfigurable mesh network

Top Layers of the OSI Stack

Page 45: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 45/52

 Low Power Design Essentials ©2008 6.45

Top Layers of the OSI Stack

Abstracts communication

architecture to system andperforms data formattingand conversion

Establishes and maintains

end-to-endcommunications

 – flow control, messagereordering, packetsegmentation and

reassembly Physical

Transport

Session

Data Link

Presentation/Application

Network

Example: Establish, maintain and rip-up connections indynamically reconfigurable Systems-on-a-Chip – Important in power-management

What About Clock Distribution?

Page 46: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 46/52

 Low Power Design Essentials ©2008 6.46

What About Clock Distribution?

Clock easily the most energy-consuming signal

of a chip – Largest length

 – Largest fanout

 – Most activity (a = 1)

Skew control adding major overhead – Intermediate clock repeaters

 – De-skewing elements

Opportunities

 – Reduced swing

 – Alternative clock distribution schemes

 – Avoiding a global clock altogether

Reduced-Swing Clock Distribution

Page 47: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 47/52

 Low Power Design Essentials ©2008 6.47

Reduced-Swing Clock Distribution

Similar to reduced-swing interconnect

Relatively easy to implement But: Extra-delay in flip-flop’s adds directly to clock period 

Example: half-swing clockdistribution scheme

Regular 2-phase clock

Half-swing clock

VDD

GND 

VDD

GND 

NMOS clock

PMOS clock

NMOS clock

PMOS clock

[Ref: H. Kojima, JSSC’95] 

 © IEEE 1995

Alternative Clock Distribution Schemes

Page 48: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 48/52

 Low Power Design Essentials ©2008 6.48

Alternative Clock Distribution Schemes

Canceling skew in perfecttransmission line scenario

Example: Transmission-Line Based Clock Distribution

[Ref: V. Prodanov, CICC’06] 

 © IEEE 2006

Summary

Page 49: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 49/52

 Low Power Design Essentials ©2008 6.49

Summary

Interconnect important component of overall

power dissipation

Structured approach with exploration at differentabstraction layers most effective

Lot to be learned from communications andnetworking community – yet, techniques must beapplied judiciously

 – Cost relationship between active and passive

components different

Some exciting possibilities for the future: 3D-integration, novel interconnect materials, opticalor wireless I/O

References

Page 50: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 50/52

 Low Power Design Essentials ©2008 6.50

Books and Book Chapters

T. Burd, ―Energy-Efficient Processor System Design,‖

http://bwrc.eecs.berkeley.edu/Publications/2001/THESES/energ_eff_process-sys_des/index.htm,UCB, 2001. 

G. De Micheli and L. Benini, ―Networks on Chips: Technology and Tools,‖ Morgan-Kaufman, 2006.

V. George and J. Rabaey, ―Low-energy FPGAs: Architecture and Design‖, Springer 2001. 

J. Rabaey, A. Chandrakasan, B. Nikolic, ―Digital Integrated Circuits: A Design Perspective,‖ 2nd ed,Prentice Hall 2003.

C. Svensson, ―Low-Power and Low-Voltage Communication for SoC’s,‖ in C. Piguet, Low-Power 

Electronics Design , Ch. 14, CRC Press, 2005. L. Svensson, ―Adiabatic and Clock-Powered Circuits,‖ in C. Piguet, Low-Power Electronics Design ,

Ch. 15, CRC Press, 2005.

G. Yeap, ―Special Techniques‖, in Practical Low Power Digital VLSI Design, Ch 6., KluwerAcademic Publishers, 1998.

Articles 

L. Benini et al, ―Address bus encoding techniques for system-level power optimization,‖ Proceedings

DATE’98, pp. 861-867, Paris, February 1998

T. Burd et al., ―A Dynamic Voltage Scaled Microprocessor System,‖ IEEE ISSCC Digest of Technical

Papers, pp. 294-295, Feb. 2000.

M. Chang et al, ―CMP Network-on-Chop Overlaid with Multi-Band RF Interconnect‖, International

Symposium on High-Performance Computer Architecture, Febr. 2008.

D.M. Chapiro, ―Globally Asynchronous Locally Synchronous Systems,‖ PhD thesis, Stanford

University, 1984.

References

References (cntd)

Page 51: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 51/52

 Low Power Design Essentials ©2008 6.51

W. Dally, ―Route Packets, Not Wires: On-Chip Interconnect Networks,‖ Proceedings DAC 2001, pp.

684-689, Las Vegas, June 2001. J. Davis and J. Meindl, ―Is Interconnect the Weak Link?,‖ IEEE Circuits and Systems Magazine, pp.

30-36, March 1998.

J. Davis et al., ―Interconnect Limits on Gigascale Integration (GSI) in the 21st Century,‖ Proceedings

of the IEEE, Vol. 89, No. 3, pp. 305-324, March 2001.

D. Hopkins et al, "Circuit techniques to enable 430Gb/s/mm2 proximity communication," IEEEInternational Solid-State Circuits Conference, vol. XL, pp. 368 - 369, February 2007.

H. Kojima et al., ―Half -Swing Clocking Scheme for 75% Power Saving in Clocking Circuitry,‖ Journalof Solid Stated Circuits, vol. 30, no 4, pp. 432-435, April 1995.

E. Kusse and J. Rabaey, ―Low-energy embedded FPGA structures,‖ Proceedings ISLPED’98,

pp.155-160, Monterey, Aug. 1998.

V. Prodanov and M. Banu, ―GHz Serial Passive Clock Distribution in VLSI using Bidirectional

Signaling,‖ Proceedings CICC 06.

S. Ramprasad et al., ―A coding framework for low-power address and data busses,‖ IEEE

Transactions on VLSI Signal Processing, Vol. 7, No 2, pp. 212-221, June 1999.

M. Sgroi et al, ―Addressing the System-on-a-Chip Woes Through Communication-Based Design,‖

Proceedings DAC 2001, pp. 678-683, Las Vegas, June 2001.

P. Sotiriadis and A. Chandrakasan, ―Reducing Bus Delay in Submicron Technology Using Coding,‖

Proceedings ASPDAC Conference, Yokohama, January 2001.

References (cntd)

References (cntd)

Page 52: Ch6 DT Interconnect

8/4/2019 Ch6 DT Interconnect

http://slidepdf.com/reader/full/ch6-dt-interconnect 52/52

References (cntd)

M. Stan and W. Burleson, ―Bus-Invert Coding for Low-Power I/O,‖ IEEE Transactions on VLSI, pp.

48-58, March 1995. M.. Stan, W. Burleson, "Low-Power Encodings for Global Communication in CMOS VLSI", IEEE

Transactions on VLSI Systems, pp. 444-455, Dec. 1997.

V. Sathe, J.-Y. Chueh, and M. C. Papaefthymiou, ―Energy-Efficient GHz-Class Charg-Recoverylogic‖, IEEE JSSC vol. 42 No 1, pp.38-47, January 2007.

L. Svensson et al., ―A sub-CV2 pad Driver with 10 ns Transition Time,‖ Proc. ISLPED 96,

Monterey, Aug. 12-14, 1996.

D. Wingard, ―Micronetwork-Based Integration for SOCs,‖ Proceedings DAC 01, pp. pp. 673-677,Las Vegas, June 2001.

H. Yamauchi et al., ―An Asymptotically Zero Power Charge Recycling Bus,‖ IEEE Journal of Solid

Stated Circuits, vol. 30, no 4, pp. 423-431, April 1995.

H. Zhang, V. George and J. Rabaey, ―Low-Swing on-chip Signaling Techniques: Effectivenessand Robustness,‖ IEEE Transactions on VLSI Systems, Vol. 8, No 3, pp. 264-272, June 2000.

H. Zhang et al, ―A 1V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications,‖ IEEE Journal of Solid-State Circuits, vol. 35, no. 11, pp. 1697-1704, Nov. 2000.