present micro09 3 - iacoma.cs.uiuc.edu

28
Li h Light64: Ligh support for data ra Light64: systemat Adrian Nistor, Darko M University of Illinois, http://iacoma ht i hth d htweight hardware ce detection during tic testing Marinov, Josep Torrellas Urbana Champaign a.cs.uiuc.edu

Transcript of present micro09 3 - iacoma.cs.uiuc.edu

Li hLight64: Lighsupport for data raLight64:

suppo o dsystemat

Adrian Nistor, Darko M

University of Illinois,http://iacomap

ht i ht h dhtweight hardware ce detection during ce de ec o du gtic testing

Marinov, Josep Torrellas

Urbana – Champaigna.cs.uiuc.edu

Outline

MotivationSystematic TesLight64Light64EvaluationConclusion

Light64

nsting

Nistor, Marinov, Torrellas

Data R● Common concurrency bug

Diffi lt t d t t● Difficult to detect

● Cause unexpected crashes e

● Example:Thread

X == Thread A

X += 1X += 1

Depending on the run

Light64

Races

ven in code that is well tested

Thread

0Thread B

X += 1X += 1

n: X = 2 or X = 1

Nistor, Marinov, Torrellas

Contribution

Light64: new data race deg

SoftwareSoftware

Hardware nonerequirement none

Execution 8 Xoverhead 8 X

NO false NO false Detects 96%

Light64

n: Light64g

etection techniqueq

Light64 HardwareLight64 Hardware

64 bits 72 400 Kbits64 bits 72-400 Kbits

1 37% 0 5%1 – 37% 0.5%

positivespositives% of races

Nistor, Marinov, Torrellas

Outline

MotivationS t tiSystematicLight64Light64EvaluationConclusion

Light64

T tic Testing

Nistor, Marinov, Torrellas

Systematiy● To detect bugs, we need high te

Very important in parallel prog● Very important in parallel prog● One input, many thread interlea

● Systematic testing

● Systematically execute many th

● Example: CHESS (used by Mic

● Systematic testers include datay

● Turned off by default

● Due to high runtime overhead

● Light64: Overhead low eno

Light64

ic Testinggest coverage

gramsgramsavings

hread interleavings

crosoft testers)

a race detection

ough to be always ON

Nistor, Marinov, Torrellas

How SystematicySEGMEN

Thread Thread

B 1Execuinterle

A B

A 1A 1Signal X

A

Multipunipro

A 2B 2

Wait XA

A

A 3

B 2

Wait Y A

A 3

Light64

c Testing WorksgNT == sequence of dynamic

instructions

ute many different eavings

A 1

gplex segments in a ocessorA 1

B 1

A 2

A 3

B 2

Nistor, Marinov, Torrellas

Outline

MotivationSystematic TesSystematic TesLight64Light64EvaluationConclusion

Light64

stingsting

Nistor, Marinov, Torrellas

Exam

Thread A

x =

A 1x = 3

Wait

A 2

Race on X: because accesses to X

Light64

mplep

0Thread B

= 0

B 1

Signal

= x

RACE

X are not ordered by synchronization

Nistor, Marinov, Torrellas

The IThread A Thread

BA 1

B 1

A 2

Perform two executions flipp

Light64

Ideaping the unordered segments

Nistor, Marinov, Torrellas

The IThread A Thread

BA 1

B 1

A 2

B 1

A 1A 1No sync ➔ May harace

B 1

A 2Sync ➔ No race possible

UN – FLIPPEDpossible

If segments A1 and B1have NO race ➔ they are indepenhave NO race ➔ they are indepenhave a RACE ➔ we also flipped t

Light64

Idea

B 1

A 1

B 1ave a

A 2PRESERVE

FLIPPED

ndent ➔ NOTHING changesndent ➔ NOTHING changesthe race ➔ Access history changes

Nistor, Marinov, Torrellas

The IThread A Thread

BA 1

B 1

A 2

x = 0

B 1

A 1A 1x = 3

= x ( 3 )B 1

A 2

= x ( 3 )

UN – FLIPPEDCHAN

If segments A1 and B1have NO race ➔ they are indepenhave NO race ➔ they are indepenhave a RACE ➔ we also flipped t

Light64

Ideax = 0

B 1

A 1

B 1

x = 3

= x ( 0 )

A 2

x 3

FLIPPEDNGED !

ndent ➔ NOTHING changesndent ➔ NOTHING changesthe race ➔ SOMETHING changes

Nistor, Marinov, Torrellas

Overv

● Use two different exec● Use two different exec● Same synchronization

● If change ➔

race

● No change ➔

Light64

view

cutionscutionsorder

know for sure there is a

highly probable no race

Nistor, Marinov, Torrellas

Phases in ● Detect if races exist

Fast over all thread interle● Fast, over all thread interle

● Issues● How to detect deviation● How to flip the segmen

Pi point races● Pin – point races● Slow, classic data race dete

● Only if there are races

● Only for the racy interleav● Only for the racy interleav

● Optimization: only for sele

Light64

Light64g

eavings executed by the testereavings executed by the tester

ns (e.g. from 0 to 3) ➔ HW hashnts with low overhead ➔ SW

ection algorithm

vingsvings

ected racy interleavings

Nistor, Marinov, Torrellas

Detecting Dg

● Per thread: hash all the values

● Compare hashes of two execu● Compare hashes of two execu● Different hashes ➔

● Identical hashes ➔

Light64

Deviations

read from memory on-the-fly

utions with same sync orderutions with same sync orderknow for sure there is a race

high probability no race

Nistor, Marinov, Torrellas

Example: DeteThread A Thread

BA 1

B 1 pA 2

UN– FLIPPED

A 1

B 1

A 1

A 2

HASH(READs)

HASH(READs)

END of e

==?

Light64

?

ecting DeviationsgFLIPPED

B 1

A 1

A 2

HASH(READs)

HASH(READs)

xecution

==?

Nistor, Marinov, Torrellas

?

HW Syy

CRC 64 hash logic

ROB

Head of ROB Accum

Light64

ystem BONUSy

REGISTER

cvirtualizemigratecontext switchno cache spillsno cache spills

64 bit registermulates values read from memory

Nistor, Marinov, Torrellas

Flip with LopThread A Thread

BA 1

B 1ST

THREA 2 INTERLEA

Light64

ow OverheadTATE TREE

A 1AD A 1AVING

B 1

A 2

A 3

B 2

Nistor, Marinov, Torrellas

Flip with LopThread A Thread

BA 1

B 1

A 2

Light64

ow Overhead

A 1A 1

B 1

A 2

A 3

B 2

Nistor, Marinov, Torrellas

Flip with LopThread A Thread

BA 1

B 1

A 2

A 1B 1

B 1 A 1

UN– FLIPPED

A 2

FLIPPED

A 2

Piggy–back on Systematic Testing primSome synchronization orders are execut

Light64

Some synchronization orders are execut

ow Overhead

A 1A 1

B 1

A 2

A 3

B 2

mitives to reduce overhead ted multiple times

Nistor, Marinov, Torrellas

ted multiple times

Outline

MotivationSystematic TesSystematic TesLight64g

EvaluatioConclusion

Light64

stingsting

on

Nistor, Marinov, Torrellas

Experimenp● Developed systematic tester in

● Tested all SPLASH-2 applicati

● Run with 2 and 4 threads

● Execution overhead● Compare to a systematic tester● Compare to a systematic tester

● Accuracy

● Compare to a systematic tester

● Propose two Light64 versions:

● Active: Aggressive flip

● Passive: Modest flipping

Light64

ntal Setuppn the lines of CHESS

ions

with no race detection: Plainwith no race detection: Plain

running a SW precise race detector

:

ping for high coverage

g for minimum overhead

Nistor, Marinov, Torrellas

Execution Over

1.21.31.4

Plain Passive

0 70.80.9

11.1

aliz

ed to

Pla

in

0.30.40.50.60.7

Ove

rhea

d N

orm

Barnes Cholesky

FFT FMM LU Ocean0

0.10.2

O

● Tradeoff execution overheay

● Active: 37% overhead,

● Passive: 2 % overhead,

Light64

● SW only: 8 X overhead,

rhead (4 threads)( )ActiveFIN

Radiosi-ty

Radix Ray-trace

Volrend Water-NS

Water-SP

MEAN

ad vs detection accuracyty trace NS SP

96% races detected

89% races detected

Nistor, Marinov, Torrellas100% races detected

Detection AccurOriginal

Light64 PrecisBarnes 311 31Cholesky 4 4yFFT 0 0FMM 0 0LU 0 0LU 0 0Ocean 2 2Radiosity 12 1Radix 0 0Raytrace 7 7Volrend 44 4Volrend 44 4Water-NS 0 0Water-SP 0 0

Light64Average race detec

racy (4 threads)y ( )Inserted Races

se SW Light64 Precise SW11 192 1924 14 140 42688 426880 40 400 15286 152860 15286 152862 45 452 16 16

0 15 167 87 884 30 434 30 43

0 69 690 162 162

Nistor, Marinov, Torrellasction accuracy: 96 %

Also in th

Additional Light64 versi● Additional Light64 versi● Optimization for the pha● Characterization of the sy

O h d d f● Overhead and accuracy f● Additional results● Software-only implemen

Light64

he Paperp

onsonsse that pin-points racesystematic testingf t th d dfor two-threaded runs

ntation

Nistor, Marinov, Torrellas

Outline

MotivationSystematic TesSystematic TesLight64gEvaluationConclusion

Light64

stingsting

n

Nistor, Marinov, Torrellas

Conclu•Introduced Light64, new technique for use during s

Light64HW 64 bits

technique for use during s

HW

Virtualize

64 bits

Cache spill

Context switch

Migration

No False Positives

Few False Negatives

Runtime Overhead 2 – 37%

Light64

usiondata race detection

systematic testingOther HW

72 –400 Kbits

systematic testing

72 †400 Kbits

NO or replicate the HW

except one

NO or additional HW

0.5%

Nistor, Marinov, Torrellas

LighLight64: Lighsupport for data raLight64:pp

systemat

Adrian Nistor, Darko M

University of Illinois,http://iacomap

htweight hardwarehtweight hardware ce detection during gtic testing

Marinov, Josep Torrellas

Urbana – Champaigna.cs.uiuc.edu