present micro09 3 - iacoma.cs.uiuc.edu
Transcript of present micro09 3 - iacoma.cs.uiuc.edu
Li hLight64: Lighsupport for data raLight64:
suppo o dsystemat
Adrian Nistor, Darko M
University of Illinois,http://iacomap
ht i ht h dhtweight hardware ce detection during ce de ec o du gtic testing
Marinov, Josep Torrellas
Urbana – Champaigna.cs.uiuc.edu
Outline
MotivationSystematic TesLight64Light64EvaluationConclusion
Light64
nsting
Nistor, Marinov, Torrellas
Data R● Common concurrency bug
Diffi lt t d t t● Difficult to detect
● Cause unexpected crashes e
● Example:Thread
X == Thread A
X += 1X += 1
Depending on the run
Light64
Races
ven in code that is well tested
Thread
0Thread B
X += 1X += 1
n: X = 2 or X = 1
Nistor, Marinov, Torrellas
Contribution
Light64: new data race deg
SoftwareSoftware
Hardware nonerequirement none
Execution 8 Xoverhead 8 X
NO false NO false Detects 96%
Light64
n: Light64g
etection techniqueq
Light64 HardwareLight64 Hardware
64 bits 72 400 Kbits64 bits 72-400 Kbits
1 37% 0 5%1 – 37% 0.5%
positivespositives% of races
Nistor, Marinov, Torrellas
Outline
MotivationS t tiSystematicLight64Light64EvaluationConclusion
Light64
T tic Testing
Nistor, Marinov, Torrellas
Systematiy● To detect bugs, we need high te
Very important in parallel prog● Very important in parallel prog● One input, many thread interlea
● Systematic testing
● Systematically execute many th
● Example: CHESS (used by Mic
● Systematic testers include datay
● Turned off by default
● Due to high runtime overhead
● Light64: Overhead low eno
Light64
ic Testinggest coverage
gramsgramsavings
hread interleavings
crosoft testers)
a race detection
ough to be always ON
Nistor, Marinov, Torrellas
How SystematicySEGMEN
Thread Thread
B 1Execuinterle
A B
A 1A 1Signal X
A
Multipunipro
A 2B 2
Wait XA
A
A 3
B 2
Wait Y A
A 3
Light64
c Testing WorksgNT == sequence of dynamic
instructions
ute many different eavings
A 1
gplex segments in a ocessorA 1
B 1
A 2
A 3
B 2
Nistor, Marinov, Torrellas
Outline
MotivationSystematic TesSystematic TesLight64Light64EvaluationConclusion
Light64
stingsting
Nistor, Marinov, Torrellas
Exam
Thread A
x =
A 1x = 3
Wait
A 2
Race on X: because accesses to X
Light64
mplep
0Thread B
= 0
B 1
Signal
= x
RACE
X are not ordered by synchronization
Nistor, Marinov, Torrellas
The IThread A Thread
BA 1
B 1
A 2
Perform two executions flipp
Light64
Ideaping the unordered segments
Nistor, Marinov, Torrellas
The IThread A Thread
BA 1
B 1
A 2
B 1
A 1A 1No sync ➔ May harace
B 1
A 2Sync ➔ No race possible
UN – FLIPPEDpossible
If segments A1 and B1have NO race ➔ they are indepenhave NO race ➔ they are indepenhave a RACE ➔ we also flipped t
Light64
Idea
B 1
A 1
B 1ave a
A 2PRESERVE
FLIPPED
ndent ➔ NOTHING changesndent ➔ NOTHING changesthe race ➔ Access history changes
Nistor, Marinov, Torrellas
The IThread A Thread
BA 1
B 1
A 2
x = 0
B 1
A 1A 1x = 3
= x ( 3 )B 1
A 2
= x ( 3 )
UN – FLIPPEDCHAN
If segments A1 and B1have NO race ➔ they are indepenhave NO race ➔ they are indepenhave a RACE ➔ we also flipped t
Light64
Ideax = 0
B 1
A 1
B 1
x = 3
= x ( 0 )
A 2
x 3
FLIPPEDNGED !
ndent ➔ NOTHING changesndent ➔ NOTHING changesthe race ➔ SOMETHING changes
Nistor, Marinov, Torrellas
Overv
● Use two different exec● Use two different exec● Same synchronization
● If change ➔
race
● No change ➔
Light64
view
cutionscutionsorder
know for sure there is a
highly probable no race
Nistor, Marinov, Torrellas
Phases in ● Detect if races exist
Fast over all thread interle● Fast, over all thread interle
● Issues● How to detect deviation● How to flip the segmen
Pi point races● Pin – point races● Slow, classic data race dete
● Only if there are races
● Only for the racy interleav● Only for the racy interleav
● Optimization: only for sele
Light64
Light64g
eavings executed by the testereavings executed by the tester
ns (e.g. from 0 to 3) ➔ HW hashnts with low overhead ➔ SW
ection algorithm
vingsvings
ected racy interleavings
Nistor, Marinov, Torrellas
Detecting Dg
● Per thread: hash all the values
● Compare hashes of two execu● Compare hashes of two execu● Different hashes ➔
● Identical hashes ➔
Light64
Deviations
read from memory on-the-fly
utions with same sync orderutions with same sync orderknow for sure there is a race
high probability no race
Nistor, Marinov, Torrellas
Example: DeteThread A Thread
BA 1
B 1 pA 2
UN– FLIPPED
A 1
B 1
A 1
A 2
HASH(READs)
HASH(READs)
END of e
==?
Light64
?
ecting DeviationsgFLIPPED
B 1
A 1
A 2
HASH(READs)
HASH(READs)
xecution
==?
Nistor, Marinov, Torrellas
?
HW Syy
CRC 64 hash logic
ROB
Head of ROB Accum
Light64
ystem BONUSy
REGISTER
cvirtualizemigratecontext switchno cache spillsno cache spills
64 bit registermulates values read from memory
Nistor, Marinov, Torrellas
Flip with LopThread A Thread
BA 1
B 1ST
THREA 2 INTERLEA
Light64
ow OverheadTATE TREE
A 1AD A 1AVING
B 1
A 2
A 3
B 2
Nistor, Marinov, Torrellas
Flip with LopThread A Thread
BA 1
B 1
A 2
Light64
ow Overhead
A 1A 1
B 1
A 2
A 3
B 2
Nistor, Marinov, Torrellas
Flip with LopThread A Thread
BA 1
B 1
A 2
A 1B 1
B 1 A 1
UN– FLIPPED
A 2
FLIPPED
A 2
Piggy–back on Systematic Testing primSome synchronization orders are execut
Light64
Some synchronization orders are execut
ow Overhead
A 1A 1
B 1
A 2
A 3
B 2
mitives to reduce overhead ted multiple times
Nistor, Marinov, Torrellas
ted multiple times
Outline
MotivationSystematic TesSystematic TesLight64g
EvaluatioConclusion
Light64
stingsting
on
Nistor, Marinov, Torrellas
Experimenp● Developed systematic tester in
● Tested all SPLASH-2 applicati
● Run with 2 and 4 threads
● Execution overhead● Compare to a systematic tester● Compare to a systematic tester
● Accuracy
● Compare to a systematic tester
● Propose two Light64 versions:
● Active: Aggressive flip
● Passive: Modest flipping
Light64
ntal Setuppn the lines of CHESS
ions
with no race detection: Plainwith no race detection: Plain
running a SW precise race detector
:
ping for high coverage
g for minimum overhead
Nistor, Marinov, Torrellas
Execution Over
1.21.31.4
Plain Passive
0 70.80.9
11.1
aliz
ed to
Pla
in
0.30.40.50.60.7
Ove
rhea
d N
orm
Barnes Cholesky
FFT FMM LU Ocean0
0.10.2
O
● Tradeoff execution overheay
● Active: 37% overhead,
● Passive: 2 % overhead,
Light64
● SW only: 8 X overhead,
rhead (4 threads)( )ActiveFIN
Radiosi-ty
Radix Ray-trace
Volrend Water-NS
Water-SP
MEAN
ad vs detection accuracyty trace NS SP
96% races detected
89% races detected
Nistor, Marinov, Torrellas100% races detected
Detection AccurOriginal
Light64 PrecisBarnes 311 31Cholesky 4 4yFFT 0 0FMM 0 0LU 0 0LU 0 0Ocean 2 2Radiosity 12 1Radix 0 0Raytrace 7 7Volrend 44 4Volrend 44 4Water-NS 0 0Water-SP 0 0
Light64Average race detec
racy (4 threads)y ( )Inserted Races
se SW Light64 Precise SW11 192 1924 14 140 42688 426880 40 400 15286 152860 15286 152862 45 452 16 16
0 15 167 87 884 30 434 30 43
0 69 690 162 162
Nistor, Marinov, Torrellasction accuracy: 96 %
Also in th
Additional Light64 versi● Additional Light64 versi● Optimization for the pha● Characterization of the sy
O h d d f● Overhead and accuracy f● Additional results● Software-only implemen
Light64
he Paperp
onsonsse that pin-points racesystematic testingf t th d dfor two-threaded runs
ntation
Nistor, Marinov, Torrellas
Outline
MotivationSystematic TesSystematic TesLight64gEvaluationConclusion
Light64
stingsting
n
Nistor, Marinov, Torrellas
Conclu•Introduced Light64, new technique for use during s
Light64HW 64 bits
technique for use during s
HW
Virtualize
64 bits
Cache spill
Context switch
Migration
No False Positives
Few False Negatives
Runtime Overhead 2 – 37%
Light64
usiondata race detection
systematic testingOther HW
72 –400 Kbits
systematic testing
72 †400 Kbits
NO or replicate the HW
except one
NO or additional HW
0.5%
Nistor, Marinov, Torrellas