Literature Review
description
Transcript of Literature Review
A 280mV-to-1.1V 256b Reconfigurable SIMD Vector Permutation Engine with 2-D Shuffle
in 22nm CMOS [ISSCC ’12]
Literature Review
Fang-Li YuanAdvisor: Prof. Dejan Markovic
03/23/2012
IC Design Challenges: 1980s – Present
Session 1.4: Sustainability in Silicon & System Development– 1980s: Design productivity– 1990s: Power dissipation– 2000s: Leakage power– 2010s:
2Fang-Li Yuan
Moore’s Law continues to provide more transistors
Energy Efficiency
Power budgets limit our ability to use them
Intel’s Solutions – From Transistors to Circuits
3Fang-Li Yuan
2007 ISSCC
2012 ISSCC
Near-Vth Computing: Great for Energy Efficiency
4Fang-Li Yuan
IA-32: 1st NTV Processor in 32nm CMOS
5Fang-Li Yuan
NTV Circuits Gain 7x Efficiency in VPFP Mult-Add
6Fang-Li Yuan
1st NTV SIMD Engine in 22nm Tri-Gate Technology
7Fang-Li Yuan
System-Level Overview
8Fang-Li Yuan
32 32×8b 3R1W RF: 4~32-way, 8/16/32/64b Vertical Perm.
256b, byte-wise, any-to-any Crossbar: Horizontal Perm.
Goal:(1) Provide flexiblity(2) Improve Vmin
(3) Reduce power(4) Lower PVT var.
Results:585 GOPS/W @280mV(9x higher than 1.1V)
Example: 64b 4x4 Matrix Transpose
9Fang-Li Yuan
RF with PVT-tolerant Techniques & Vector FFs
10Fang-Li Yuan
Clockless static reads eliminate keeper contention in dynamic BLs
Vector flip-flops w/ shared local min-sized clock INVs
average the variation
Shared P/N on virtual supplies limits strength of cross-coupled INVs
Byte-wise enable-signal gating reduce 49% of switching power
250mV Vmin Reduction Across PVT Variations
11Fang-Li Yuan
250 mV
Vector FFs Reduce Hold-Time Violations @ Low V
12Fang-Li Yuan
ULVS LS, & Interleaved Folded Crossbar Layout
13Fang-Li Yuan
Vector mux averages variation effect of min-sized devices by
sharing transistors across gates
Folded layout: 50% reduction of wiring
Interleaved layout: 50% lower coupling
Decouples CVSL stage from o/p driver & contention devices: 20~32% lower power, 125mV improved Vmin
ULVS Improves Vmin by 125mV
14Fang-Li Yuan
RF and Logic Co-optimization: Iso-Vmin
15Fang-Li Yuan
Measured Performance
16Fang-Li Yuan
585 GOPS/W @0.26V(9x higher than 0.9V)
RF: 227mW, 2.5GHz @1.1VXbar: 69mW, 2.9GHz @1.1V
RF: 109μW, 16.8MHz @0.28VXbar: 19μW, 10MHz @0.24V
RF: 106mW, 1.8GHz @0.9VXbar: 36mW, 2.3GHz @0.9V
Conclusions
NTV computing is energy efficient but sensitive to PVT variation Static ckts (e.g. RF read): better than dynamic ckts @ NTV
Shared P/N DETG writes improve Vmin across PVT variations Vector FF/Mux share transistors across gates, averaging variation
ULVS LS interrupts contention devices, improving Vmin & power Byte-wise enable-signal gating reduces power
Folded layout has 50% reduction in critical wiring length Interleaved, opposite-direction data wires achieve 50% lower
line-to-line coupling, improving SI & delay
17Fang-Li Yuan