CS294-6 Reconfigurable Computing Day 12 October 1, 1998 Interconnect Population.

30
CS294-6 Reconfigurable Computing Day 12 October 1, 1998 Interconnect Population
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of CS294-6 Reconfigurable Computing Day 12 October 1, 1998 Interconnect Population.

CS294-6Reconfigurable Computing

Day 12

October 1, 1998

Interconnect Population

Today

• Costs of full population

• Tradeoffs

• Building Blocks

• Empirical Evidence

Symmetric

Symmetric Wire Channels

• IONP

• Bisection BWNP

• WNP-0.5

– N if p<0.5

• Wlog(N)– P=0.5

XC4K (Commercial Symmetric)

Symmetric Full Population

• For simplicity: XC4K– 8 short– 4 double– Two sides with 6 connections– => 10x10 switchbox + 2x (12x6) C-boxes– 2x12x6 + 4x10x30/2 = 744 switches– 2.5K2/switch => 1.9M2

– Compare real XC4K at 1.25M2

Wires vs. Switches

• Full Population – Switch area => W2

• (4x3/2)W2

• ~15K W2

– Wire Area => W2

• 8Wx8WW2

Avoidable?

• How can we avoid N2 switches?

Switching Problem

• Connect any permutation of N sources to N sinks:– Crossbar: N2 switches

Benes Network

• Routes any permutation

• O(Nlog(N)) switches

Tradeoff

• Crossbar– N2

– N=16 => 256

– single serial switch

• Benes– 2log(N)-1 stages

– N/2 2x2’s per stage

– 4 switches/2x2

– 4Nlog(N)-2N

– N=16 => 224

– (w/ 4x4s, 192)

– 2logb(N)-1 serial switches

• (2x2=>7, 4x4=>3)

General trend: flatter => more total switches factored => longer switch series

Concentrators

• Select M signals from N– (N>M)

• Crossbar– NxM

Concentrators

• Order of outputs often not important– (e.g. LUT inputs)

• Limit “crossbar” population to:– M x (N-M+1)

Switchboxes

• Goal: route an input to an output on a different side– doesn’t matter which output

• …as long as output can route on to destination

Switchboxes

• Intro: 4 x (3W=>W) – 6W2

• (push: does order matter?)

Switchboxes (Linear Population)

• Connect – each wire– on each side– to a single connection on each destination side

• Linear total switch population– Switches = (3x4xW)/2 = 6W

Switchboxes (Xilinx/Diamond)

• Linear -- connect to same corresponding channel each side

Switchboxes (Universal)

• Linear

• Principle: Supports all sets of simultaneous connection requests up to channel width limits for switchbox– connection request:

• N-S, S-W, etc.

Switchbox (domain schemes)

Asymptotically: Universal route 25% more connection sets than Diamond (strict superset)

Enough?

• Universal switch guarantees each switch is locally routable– if all four sides are unconstrained

• But, routing one universal switch places constraints on connections– not guarantee a whole set of connections can be

routed

Mapping Ratio

• Partial schemes – => switching limitations may prevent use of

some channels

• As a result, need more channels to detail route than implied by global route – (counting wires in each channel segment)

• Mapping Ratio– Detail Routing Channels / Global Channels

Lack of CMR for domain schemes

• Two negative results from UCSB– Any domain scheme (diamond, universal) is:

• NP-complete in detailed routing – figuring out which channels/switches to use

– reduce graph coloring to domain routing

• No Constant Mapping Ratio (CMR)

Toronto Experiments

• Review Fig 5 and 6 and Table 2 from– Rose and Brown JSSC v26n3p277, mar91

• Recommendations:– 3-4 connections per wire in switchbox

• (linear schemes = 3)

• this is pre-discovery of universal switchbox

– input switch population 79-90%• commensurate with M choose N structure

CMR->“Perfect” Switchboxes

• Universal was complete if unconstrained– Hybrid idea:

• build assuming one (or more) sides are unconstrained

– get to use linear switches on these sides

• fully populate constrained sides

• tighten guaranteed routing bound

Greedy Routing Architecture

Constant Mapping Ratio

• Linear => CMR=1.5

• Problem is using 3rd side, can build flat concentrator with– W2/2 switches– gives CMR=1 for tree routes

Finishing Greedy Perfect Route• Tree Connections routed with:

– 2W+W2/2 switches

• Must route one remaining side as before– 3W2

• Total– 3.5W2+2W– (compare 6W2 intro)

• Use Benes for constrained sides– O(Wlog(W))

Summary

• Even with limited switching schemes we’ve explored, full population appears untenable.

• Full population often more than really need.

• Can we define switching structures with “nice” routing properties, reasonable number of switches, and reasonable delay?

Summary (2)

• Empirical evidence that “linear” populations are adequate– routing challenge rises from lack of guarantees – better theory as to why adequate?– slight changes improve guarantees, ease of

route?