ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

εε-Optimal Minimum-Delay/Area Zero-Skew Clock -Optimal Minimum-Delay/Area Zero-Skew Clock

Tree Wire-Sizing in Pseudo-Polynomial TimeTree Wire-Sizing in Pseudo-Polynomial Time

Jeng-Liang TsaiJeng-Liang Tsai

Tsung-Hao ChenTsung-Hao Chen

Charlie Chung-Ping ChenCharlie Chung-Ping Chen (National Taiwan (National Taiwan

University)University)

University of Wisconsin-Madisonhttp://vlsi.ece.wisc.edu

OutlineOutline

Background• Motivation and contribution• Literature overview

ClockTune algorithm• Problem formulation• ClockTune algorithm overview• Optimality and complexity analysis

Experimental results• Runtime, memory usage, and optimality• Power/Delay trade-off• Incremental refinement

MotivationMotivation

Clock skew cycle time penalty• Start with zero-skew clock tree

• Minimize clock delay reduces system-level skew (Kuh, et al. [DAC ‘90])

Clock tree is power-hungry (30% in Intel McKinley(0.18um/1GHz/130W) • P = f CV2

• Minimize switching capacitance (wiring area)

Stability affects design convergence• Allow incremental refinement to accommodate local changes

Interconnect delay dominates total delay• Wire-sizing is effective in reducing interconnect delay

MotivationMotivation

Non-convex zero-skew constraints• No known algorithm solves zero-skew wire-sizing problem optimally

with polynomial runtime

Hence, a good clock tree wire-sizing algorithm can Minimize delay and power Guarantee optimality and runtime Have good stability

ContributionContribution

First ε-optimal algorithm for solving clock min-delay/power zero-skew wire-sizing optimization problem

Provide complete (Sampled) solution set of the delay/power/area trade-off information for design planning

Efficient pseudo-polynomial runtime (6170-branch clock tree in 6 minutes within 1% optimality)

Runtime v.s. Optimality tradeoff Incremental clock re-balancing to speed up design convergence

Literature OverviewLiterature Overview

“Reliable non-zero skew clock tree using wire width optimization”, Pillage, et al. [DAC ’93]• Iteratively optimize skew and delay using adjoint sensitivity analysis• Aimed at reliable clock trees under process variation

Deferred Merging Embedding (DME) algorithm, Kahng, et al. [TCAD ’92] • Bottom-up merging segment construction, top-down embedding

Integrated Deferred Merging Embedding (IDME) algorithm, Wong, et al. [ISPD’00]• Handles simultaneous routing, buffer-insertion, and wire-sizing• Merging segment set: a set of line samples of a merging region• No optimality guarantee• The size of MSS grows exponentially

“Process variation aware clock tree routing”, Lu, et al. [ISPD ’03]• Based on DME/BST

OutlineOutline

Background• Motivation and contribution• Literature overview

ClockTune algorithm• Problem formulation• ClockTune algorithm overview• Optimality and complexity analysis

Experimental results• Runtime, memory usage, and optimality• Power/Delay trade-off• Incremental refinement

Problem formulationProblem formulation

min-ZSWS (Zero Skew Wire Sizing) problem• Given a clock routing

minimize

where Pi, Pj are paths from v to leaf nodes i and jZero-skew constraints are non-convex constraints

• No known algorithm solves the problem optimally in polynomial runtime

jiwPwP

s)constraint skew(Zero),(delay)(delay

Delay)(delay

Area)(area

)(delay)(area

DC region approachDC region approach

Clock Delay and wiring Capacitance are top concerns Define f : RN R2, such that

• fY(w) = Delay(Tv(w)), fX(w) = Capacitance(Tv(w))

• DC region (v): The projection of the feasible region

• Choose a d-c pair from the DC region on R2

f : R6 -> R2

DC regionTv

Feasible region

ClockTuneClockTune algorithm algorithm overviewoverview

Phase 1: bottom-up construct DC regions for every node Phase 2: top-down embedding after delay/power tradeoff

(a) (b)

Optimality analysisOptimality analysis

Embeddings not fall on the delay samples will be omitted• Propagated error

• Delay sampling error

• Wire width sampling error (detailed in the paper)

DC region

DC region usingchildren informationSampled DC region

DC region

Sampled DC region

Optimality analysisOptimality analysis Error is bounded

d : delay sampling resolution

w : wire width sampling resolution

• k, : Constants related to l, r0, c0, wm, wM …

Generally speaking, error reduced about a half when resolution doubled

ErrorError

ResolutionResolution

Optimality runtime Optimality runtime trade offtrade off

Control sampling resolution can trade off optimality with runtime and memory

128 256 512 1024

(sample )

Minimum delay v.s. Optimal delay

0 1000 2000 3000 4000

p, q = 1024

(node )

256128

Runtime

Complexity analysisComplexity analysis

Runtime• Bottom-up phase takes O(n p max(p,q))

• Top-down phase takes O(np)

• Overall: O(n p max(p,q))

MemoryO(np)

where n : number of nodes of the clock tree,

p : number of delay samples taken at each node

q : number of wire width samples taken at each level-2 node

OutlineOutline

Background• Motivation and contribution• Related works• problem formulation

ClockTune Algorithm• Design space projection• Algorithm overview• Optimality and complexity analysis

Experimental Results• Runtime, memory usage, and optimality• Power/Delay trade-off• Incremental refinement

Experimental setupExperimental setup

• ClockTune is implemented in C++, executed on a 128MB 533MHz Pentium III PC

Benchmarks r1 – r5 from Tsay et al. [ICCAD‘91] Initial routing generated by BB+DME algorithm with minimum

wire width w = 1 m ClockTune uses wm = 1 m, wM = 4 m

p: number of delay samples taken at every node q: number of wire width samples taken at every level-2 node r0 = 0.03, c0 = 210-16/m2

Runtime and memory Runtime and memory usageusage

Runtime and memory usage are linear to problem size when p, q are fixed Within 1% optimality when p,q=256 (runtime < 6 minutes, memory ~ 64MB)

p, q = 256 # sink nodes # branches Runtime (s) Memory (MB) Optimality

r1 267 527 24.1 6.0 0.38%

r2 598 1185 61.0 12.5 0.71%

r3 862 1710 100.0 14.4 0.46%

r4 1903 3787 202.4 38.0 0.57%

r5 3101 6170 339.2 64.0 0.93%

0 1000 2000 3000 4000

p, q = 1024

(node )

256128

Runtime

0102030405060708090

0 1000 2000 3000 4000

(node)

p, q = 1024

Memory Usage

Optimality resultsOptimality results

Optimality Error below 1% with p=q=256 Error reduced to about a half when resolution doubled

128 256 512 1024

(sample )

Minimum delay v.s. Optimal delay

128 256 512 1024

(sample )

Minimum area v.s. Optimal area

Power/Delay trade-offPower/Delay trade-off

Capacitance

0.2~1.1nF0.2~1.1nF

5~150ns5~150ns

Minimum powerMinimum power

Minimum delayMinimum delay

15:1 delay:power trade-off

Incremental Incremental refinementrefinement

DC region captures the design space• Enables incremental refinement

Conclusion & Future Conclusion & Future WorkWork

Provide a zero-skew clock tree wire-sizing algorithm which• Minimizes delay and area ε-optimally

• Guarantees pseudo-polynomial runtime and memory usage

• Provides delay/power trade-off information to designers

• Speeds up design convergence by allowing clock tree re-balancing with minimum changes

Better delay model Buffer insertion/sizing capability

Thank you !Thank you !

ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

Documents

Transcript of ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time

Prv sizing

Polynomial Control - Polynomial Control Systems

vfd sizing

Makalah Kimia Organik Skew

Fcu Sizing

Title Some Topics in Integral Group Rings (Skew Polynomial ...repository.kulib.kyoto-u.ac.jp/dspace/bitstream/2433/102778/1/0438 … · Title Some Topics in Integral Group Rings (Skew

DISTRIBUSI SKEW-NORMAL SKRIPSI

De skew van een lnb

légcsavartervezésPropeller sizing

APÓSTILA: DIAGRAMA TERMODINÂMICO (SKEW-T)

PVRV Sizing

Line Sizing

Fran Sizing

Cable Sizing

Capacity Bounds on Polynomial Coefficients · Capacity Bounds on Polynomial Coefficients Polynomial Capacity: Theory, Applications, Generalizations Jonathan Leake Technische Universität

Leser PSV sizing

grinding and sizing

MMeeeetthhaannddbbooeekk - COSMIC Sizing

Laporan Grinding Sizing

Socc11 Cluster Sizing