Literature Review

A 280mV-to-1.1V 256b Reconfigurable SIMD Vector Permutation Engine with 2-D Shuffle

in 22nm CMOS [ISSCC ’12]

Literature Review

Fang-Li YuanAdvisor: Prof. Dejan Markovic

03/23/2012

IC Design Challenges: 1980s – Present

Session 1.4: Sustainability in Silicon & System Development– 1980s: Design productivity– 1990s: Power dissipation– 2000s: Leakage power– 2010s:

2Fang-Li Yuan

Moore’s Law continues to provide more transistors

Energy Efficiency

Power budgets limit our ability to use them

Intel’s Solutions – From Transistors to Circuits

3Fang-Li Yuan

2007 ISSCC

2012 ISSCC

Near-Vth Computing: Great for Energy Efficiency

4Fang-Li Yuan

IA-32: 1st NTV Processor in 32nm CMOS

5Fang-Li Yuan

NTV Circuits Gain 7x Efficiency in VPFP Mult-Add

6Fang-Li Yuan

1st NTV SIMD Engine in 22nm Tri-Gate Technology

7Fang-Li Yuan

System-Level Overview

8Fang-Li Yuan

32 32×8b 3R1W RF: 4~32-way, 8/16/32/64b Vertical Perm.

256b, byte-wise, any-to-any Crossbar: Horizontal Perm.

Goal:(1) Provide flexiblity(2) Improve Vmin

(3) Reduce power(4) Lower PVT var.

Results:585 GOPS/W @280mV(9x higher than 1.1V)

Example: 64b 4x4 Matrix Transpose

9Fang-Li Yuan

RF with PVT-tolerant Techniques & Vector FFs

10Fang-Li Yuan

Clockless static reads eliminate keeper contention in dynamic BLs

Vector flip-flops w/ shared local min-sized clock INVs

average the variation

Shared P/N on virtual supplies limits strength of cross-coupled INVs

Byte-wise enable-signal gating reduce 49% of switching power

250mV Vmin Reduction Across PVT Variations

11Fang-Li Yuan

250 mV

Vector FFs Reduce Hold-Time Violations @ Low V

12Fang-Li Yuan

ULVS LS, & Interleaved Folded Crossbar Layout

13Fang-Li Yuan

Vector mux averages variation effect of min-sized devices by

sharing transistors across gates

Folded layout: 50% reduction of wiring

Interleaved layout: 50% lower coupling

Decouples CVSL stage from o/p driver & contention devices: 20~32% lower power, 125mV improved Vmin

ULVS Improves Vmin by 125mV

14Fang-Li Yuan

RF and Logic Co-optimization: Iso-Vmin

15Fang-Li Yuan

Measured Performance

16Fang-Li Yuan

585 GOPS/W @0.26V(9x higher than 0.9V)

RF: 227mW, 2.5GHz @1.1VXbar: 69mW, 2.9GHz @1.1V

RF: 109μW, 16.8MHz @0.28VXbar: 19μW, 10MHz @0.24V

RF: 106mW, 1.8GHz @0.9VXbar: 36mW, 2.3GHz @0.9V

Conclusions

NTV computing is energy efficient but sensitive to PVT variation Static ckts (e.g. RF read): better than dynamic ckts @ NTV

Shared P/N DETG writes improve Vmin across PVT variations Vector FF/Mux share transistors across gates, averaging variation

ULVS LS interrupts contention devices, improving Vmin & power Byte-wise enable-signal gating reduces power

Folded layout has 50% reduction in critical wiring length Interleaved, opposite-direction data wires achieve 50% lower

line-to-line coupling, improving SI & delay

17Fang-Li Yuan

Literature Review

Documents

Transcript of Literature Review