Post on 19-Aug-2020
1
Dependable SRAM Techniques for Highly
Reliable VLSI System
JST/CREST/DVLSI Symposium,
Dec. 7, 2013
Masahiko Yoshimoto, Kobe Univ.
Hiroshi Kawaguchi, Kobe Univ.
Makoto Nagata, Kobe Univ.
Koji Nii, Renesas Electronics
Shigeru Oho, Nippon Institute of Technology
Yasuo Sugure, Hitachi, Ltd.
2
Vth variation
NBTI,
temperature fluctuation
Power supply noise
Soft error
• Autonomous memory with BIST
• QoB and FGVC memory
• Lockstep with QoB multicore
• Power supply noise filter
• Autonomous memory with ODM
• Lockstep with QoB multicore
• Soft error tolerant memory cell
• Lockstep with QoB multicore
Improvement / tolerant
techniques
Dependability
degradation factor
Summary of proposed techniques
QoB: Quality-of-Bit Memory, ODM: On-Die Monitor, FGVC: Fine-Grain Voltage Control
3
• QoB-SRAM and FGVC-SRAM has been implemented.
• Autonomous control logic applying BIST and power supply noise
monitor has been combined with QoB and FGVC SRAM.
Chip layout Block diagram
(40nm process, 5mm x 5mm)
QoB-SRAM
ODM
FGVC-SRAM
Controller
QoB-SRAM
Autonomous dependable memory chip
On-die monitor
Flexible power supply network
Logic circuit
BIST
Power supply monitor
Temperature monitor Aging
monitor
Autonomic dependable
memory system
Programmable
power switch
controller
• FGVC
• QoB
• Failure block
replace
Voltage
change
Reconfig.
Replace
Memory bank
4
Evaluation of SRAM products regarding dependability
Specification of ODM for power and ground noise measurements
Voltage range: -200 to 200 mV
Measurement bandwidth: 1.0 GHz
Evaluation of SRAM cores with different bit capacities
SRAM + BIST
On-die
monitor
11
0 m
m
50 mm Thermal sensor
Power noise measurements
with on-die monitor (ODM)
5
Susceptibility of SRAM evaluated with ODM
SRAM is susceptible to voltage variations at low frequencies
Failure prevention by predicting Vdd droops with low frequencies
The minimum RF power to brings about SRAM bit errors increases for the higher RF
frequencies Failures happen when the time duration of Vdd lower than Vthr
becomes larger than Thr.
★ Combination of “EMC test” and “ODM observation” for in-depth understanding of
SRAM immunity, as demonstrated for the first time.
20
22
24
26
28
30
0.8
1
1.2
1.4
20 40 60 80 100 120 140
Pnet (dBm) Vddm_min (V)
Frf (MHz)
Fclk = 100 MHz
0.8
1
1.2
1.4
1.6
1.8
2
2.2
0 5 10 15 20
Vddm (V)
Time (ns)
Tthr
Vthr
6
7T/14T Quality-of- Bit (QoB) SRAM
7T/14T SRAM Conventional 6T SRAM
P0 P1
/CL
CL: control line
By Connecting internal nodes directly
w/ additional transistors, enhanced operating
margin and improved reliability is achievable
7
Associativity-reconfigurable cache
with QoB SRAM Vmin & IPC Penalty Memory Architecture
0.5
0.55
0.6
0.65
0.7
8 way 7 way 6 way 5 way 4 way
Associativity
Vm
in [
V]
256KB L2 Cache 32KB IL1 Cache 32KB DL1 Cache
115 mV
0.94
0.95
0.96
0.97
0.98
0.99
1
No
rma
lize
d IP
C
8-way
/256 KB
7-way
/224 KB
6-way
/192 KB
5-way
/160 KB
4-way
/128 KB
- 4.93 %
Normal Mode Dependable Mode
Associativity can be reconfigured dynamically
using QoB sheme and enhances reliability.
The proposed cache can improve
Vmin by 115 mV, with only less than
4.93% IPC loss and 6% area overhead.
8
ASR0 ASR1
WL0
WL1
WL31
...
...
WL driver Pull-down
NMOS
ASW1
VDD
CVDD0
...
ASW0
WE
(write-enable)Y0
VDD
CVDD31
Y31
...
...
...
Read assist circuit TEG chip and 128-Kb SRAM layout
5Mb SRAM (128kb x 40)
and memory BIST w/ assist logic
プログラマブル
電源制御部
電圧モニタ
Cell array Cell array
Colum I/O Colum I/O
Add. buffer and control
Ro
w d
ec
od
er
Write-assist
circuits
Write-assist
circuits
10.5 µm
495 µm
580 µ
m
Read-assist
circuits
2.2 µm
Write assist circuit
Fine Grain Voltage Control (FGVC) SRAM
Ap
ply
fo
r re
ad
-assis
t b
ias
Apply for write-assist bias
Read-failure bits @0.7V
Write-failure bits @0.7V
32 rows
32 columns
Read-bump
tests
Write-bump
tests
Measured typical fail-bitmaps
9
Measurement results of Vmin Improvement by
FGVC SRAM
-3
-2
-1
0
1
2
3
0.4 0.5 0.6 0.7 0.8 0.9 1.0
σ
Vmin of 1Mb (V) @ Process and Temp. worst
- Dependable MemoryをLCDのフレームバッファとして使用したデモボードを設計- ノイズや環境変化を模して、メモリの一部にVDD=0.7Vを印加し(通常は1.1V)、Dependable Memory Systemの有無を実験
System OFF
・Dependabilityが悪くなって、SRAMが動作不良
System ON
・Dependability改善⇒正常にLCDが表示
FGVC
assist
w/o assist
50mV Improvement 0.60
0.65
0.70
0.75
0.80
0.85
0.90
#1 #2 #3 #4 #5 #6 #7 #8 #9
Samples
Vm
in (
V)
w/o Assist
Conv. Assist
Prop. Assist
Measurement results for 90nm 1Mb SRAM
Minimum Vmin is achieved as 0.64 V, which
is improved by 21% at most compared to the
conventional.
Measurement results
for 40nm 1Mb SRAM
Demo-system as frame buffer
10
Autonomous dependable memory and
voltage droop immunity evaluation
Address
Test moduleData7T/14T bitenhancing
SRAM tag array(19kbit)
256KB 8way cache
(256 kbit×8)
Vdd_rt Vdd_test
7T/14T bitenhancing
SRAM data array(256kbit)
Volt./temp.monitoring
circuit
Thermaldiode
ExternalDAC
Vref
Realtime user bus
Test bus
EXT I/F
Vdd_rt Vdd_test
Vdd_testVdd_rt
Test voltage control
DAC control
Data Transfer Unit
Data
Data transfer unit
Autonomous controller
Test module
Autonomous resilient cache controller
Reconfigurecache
Monitorresult
Monitoring instruction
ExternalDCDC regulator
Online testing controller
Under 25% droop
Under 30% droop Under 35% droop
1E+0
1E+1
1E+2
1E+3
1E+4
1E+5
1E+6
# o
f fa
ilu
re
Temp.: 25degC
100us 1ms 10ms 100msDroop duration
w/o prop. [13]
w/ prop.
× 91 improvedgamessmcfsjenggromacsbzip2
gamessmcfsjenggromacsbzip2
Mean
Mean
w/ prop.
w/o prop.
1E+0
1E+1
1E+2
1E+3
1E+4
1E+5
1E+6
# o
f fa
ilu
re
Temp.: 25degC
100us 1ms 10ms 100ms
Droop duration
w/o prop. [13]
w/ prop.
does not fail
Me
an
1E+0
1E+1
1E+2
1E+3
1E+4
1E+5
1E+6
# o
f fa
ilu
re
Temp.: 25degC
100us 1ms 10ms 100ms
Droop duration
w/o prop. [13]
w/ prop.
does not fail
Nom. Vdd
Vref_high
Vref_low
Droop duration
Vmin_normal
Vmin_ENH
35% droop
30% droop
25% droop
Op
era
te in
en
han
cin
g m
od
eb
elo
w V
min
_n
orm
al
Block Diagram Evaluation Results
Th
erm
al
dio
de
Vo
lt./tem
p.
mo
nito
r
7T
/14
T S
RA
M
256kb
data
19
kb
tag
Autonomous resilient cache controller
7T
/14T
SR
AM
256k
b d
ata
19
kb
tag
7T
/14
T S
RA
M
256kb
data
19
kb
tag
7T
/14T
SR
AM
256
kb
da
ta
19
kb
tag
7T
/14
T S
RA
M
256k
b d
ata
19
kb
tag
7T
/14
T S
RA
M
256k
b d
ata
19
kb
tag
7T
/14
T S
RA
M
256
kb
da
ta
19
kb
tag
7T
/14
T S
RA
M
256
kb
da
ta
19
kb
tag
19k
b t
ag
arr
ay
256kb
data
arr
ay
75
0u
m
500um
40nm CMOS 256KB QoB Cache
・Block-wise Vmin measurement by BIST
・Voltage-Droop measurement by ODM
・Vmin margin enhancement by
associativity reconfiguration
・×91 Failure Rate Improvement
・Less than 3% IPC degradiation
11
Controller
model
Vehicle
engine model
Collaborative sim.
Control software
ECU
Micro-controller
CPU Peripheral
SH-2A CPU
Bridge
INTC CMT
DMAC
A/D ATU
Fault- Injectable bus bridge
Micro-
controller
model
Internal SRAM model
SRAM failure
data pattern
Matlab®/Simulink® simulator
CoMETTM simulator
Verification
conditions
for system
Fault-
injection
scheme
Fault Case
Generator
System-level verification environment (PILS)
System-level verification environment
with SRAM fault-injection
12
Fault Case Generator (FCG)
FCG can generates time-series SRAM failure data patterns
correspond to arbitrary waveforms for the power supply
noise and operating temperature, and device parameters
Supply voltage (V)
Output
Temperature (degC)Process variation (σVth)
Input
NBTI aging (ΔVth)Soft error rates (FIT)
Fault Case Generator
SRAM failure data
pattern generator
BER tableSRAM BER library
SRAM failure
data pattern
BERInput for BER table
(Supply volt., temp., ...)
SRAM capacity (KB)
Information of virtual chip
13
Cloud
Computing
Results ディペンダブルメモリ評価用自動車エンジン制御検証環境
ECU model Vehicle engine
model
Control Software
ECU
Micro controller
CPU Peri- pheral
SH-2A CPU
Fault Case
Generator
Bridge
Microcontroller model
Memory Model
#2
Memory
Model
SRAM failure data pattern
Co-
sim
Matlab®/ Simulink® Simulator
Vertual ECU
Simulator
(Synopsys) Device
conditions
Fault
Injection
scheme
Test
Scenarios
Co-sim Co-sim Co-sim Co-sim Co-sim Co-sim Co-sim
Parallel
Execution
Conventional
SRAM
With
Dependable
SRAM
-50 0 80
150
0.4
0.5
0.6
0.7
0.8
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
Sys
tem
Err
or
Rat
e
-50 0 80
150
0.4
0.5
0.6
0.7
0.8
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
Sys
tem
Err
or
Rat
e
INTC CMT A/D ATU
DMAC Bridge
Large-scale verification for system error rate during software execution
when SRAM fault occurs
Evaluation of 672,000 test scenarios were done within 12 hours on 600
computational nodes.
Large-scale system-level verification
using cloud-computing
14
0.4
0.5
0.6
0.7
0.8
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
1.E-04
1.E-03
1.E-02
1.E-01
1.E+00
-50 0
80
150
-25
40
125
Ab
no
rmal
term
inati
on
rate
0.4
0.5
0.6
0.7
0.8
-5
0 0
80
150
-25
40
125
System-level evaluation results
ECU System w/ 6T SRAM ECU System w/ 7T/14T SRAM
ECU w/ Dependable SRAM improves
the minimum op. voltage by 0.05–0.15 V