09 Parallel II 11 02 Ann

download 09 Parallel II 11 02 Ann

of 24

Transcript of 09 Parallel II 11 02 Ann

  • 8/12/2019 09 Parallel II 11 02 Ann

    1/24

    Chapter9

    ParallelAlgorithms

    AlgorithmTheory

    WS2013/14

    FabianKuhn

  • 8/12/2019 09 Parallel II 11 02 Ann

    2/24

    AlgorithmTheory,WS2013/14 FabianKuhn 2

    ParallelComputations

    :timetoperformcomp.withprocs

    :work(total#operations)

    Timewhen

    doing

    the

    computation sequentially

    :criticalpath/span

    Timewhen

    parallelizing

    as

    muchaspossible

    LowerBounds:

    ,

  • 8/12/2019 09 Parallel II 11 02 Ann

    3/24

    AlgorithmTheory,WS2013/14 FabianKuhn 3

    BrentsTheorem

    BrentsTheorem:Onprocessors,aparallelcomputationcanbeperformedintime

    .

    Corollary:Greedyisa2approximationalgorithmforscheduling.

    Corollary:Aslongasthenumberofprocessors O ,itis

    possibletoachievealinearspeedup.

  • 8/12/2019 09 Parallel II 11 02 Ann

    4/24

    AlgorithmTheory,WS2013/14 FabianKuhn 4

    PRAM

    Backto

    the

    PRAM:

    Sharedrandomaccessmemory,synchronouscomputationsteps

    ThePRAMmodelcomesinvariants

    EREW(exclusiveread,exclusivewrite):

    Concurrentmemoryaccessbymultipleprocessorsisnotallowed

    If

    two

    or

    more

    processors

    try

    to

    read

    from

    or

    write

    to

    the

    same

    memorycellconcurrently,thebehaviorisnotspecified

    CREW(concurrentread,exclusivewrite):

    Readingthe

    same

    memory

    cell

    concurrently

    is

    OK

    Twoconcurrentwritestothesamecellleadtounspecified

    behavior

    This

    is

    the

    first

    variant

    that

    was

    considered

    (already

    in

    the

    70s)

  • 8/12/2019 09 Parallel II 11 02 Ann

    5/24

    AlgorithmTheory,WS2013/14 FabianKuhn 5

    PRAM

    ThePRAM

    model

    comes

    in

    variants

    CRCW(concurrentread,concurrentwrite):

    Concurrent

    reads

    and

    writes

    are

    both

    OK

    Behaviorofconcurrentwriteshastospecified

    WeakCRCW:concurrentwriteonlyOKifallprocessorswrite0

    CommonmodeCRCW:allprocessorsneedtowritethesamevalue

    ArbitrarywinnerCRCW:adversarypicksoneofthevalues

    PriorityCRCW:valueofprocessorwithhighestIDiswritten

    StrongCRCW:largest(orsmallest)valueiswritten

    Thegivenmodelsareorderedinstrength:

    weak commonmode arbitrarywinner priority strong

  • 8/12/2019 09 Parallel II 11 02 Ann

    6/24

    AlgorithmTheory,WS2013/14 FabianKuhn 6

    SomeRelationsBetweenPRAMModels

    Theorem:A

    parallel

    computation

    that

    can

    be

    performed

    in

    time

    ,

    usingprocessorsonastrongCRCWmachine,canalsobe

    performedintime log usingprocessorsonanEREW

    machine. Each(parallel)stepontheCRCWmachinecanbesimulatedby

    log stepsonanEREWmachine

    Theorem:Aparallelcomputationthatcanbeperformedintime,

    usingprobabilisticprocessorsonastrongCRCWmachine,can

    alsobeperformedinexpectedtime log using log

    processorson

    an

    arbitrary

    winner

    CRCW

    machine.

    Thesamesimulationturnsoutmoreefficientinthiscase

  • 8/12/2019 09 Parallel II 11 02 Ann

    7/24

    AlgorithmTheory,WS2013/14 FabianKuhn 7

    SomeRelationsBetweenPRAMModels

    Theorem:A

    computation

    that

    can

    be

    performed

    in

    time

    ,

    using

    processorsonastrongCRCWmachine,canalsobeperformedin

    time using processorsonaweakCRCWmachine

    Proof: Strong:largestvaluewins,weak:onlyconcurrentlywriting0 isOK

  • 8/12/2019 09 Parallel II 11 02 Ann

    8/24

    AlgorithmTheory,WS2013/14 FabianKuhn 8

    SomeRelationsBetweenPRAMModels

    Theorem:A

    computation

    that

    can

    be

    performed

    in

    time

    ,

    using

    processorsonastrongCRCWmachine,canalsobeperformedin

    time using processorsonaweakCRCWmachine

    Proof: Strong:largestvaluewins,weak:onlyconcurrentlywriting0 isOK

  • 8/12/2019 09 Parallel II 11 02 Ann

    9/24

  • 8/12/2019 09 Parallel II 11 02 Ann

    10/24

    AlgorithmTheory,WS2013/14 FabianKuhn 10

    ComputingtheMaximum

    Theorem:If

    each

    value

    can

    be

    represented

    using

    log bits,

    the

    maximumof(integer)valuescanbecomputedintime1 using

    processorsonaweakCRCWmachine.

    Proof:

    Firstlookat

    highestorderbits

    The

    maximum

    value

    also

    has

    the

    maximum

    among

    those

    bits Thereareonly possibilitiesforthesebits

    max.of

    highestorderbitscanbecomputedin 1 time

    Forthosewithlargest

    highestorderbits,continuewith

    nextblockof

    bits,

  • 8/12/2019 09 Parallel II 11 02 Ann

    11/24

    AlgorithmTheory,WS2013/14 FabianKuhn 11

    PrefixSums

    Thefollowing

    works

    for

    any

    associative

    binary

    operator

    :

    associativity:

    AllPrefix

    Sums:Given

    asequence

    of

    values

    , , ,theall

    prefixsumsoperationw.r.t. returnsthesequenceofprefixsums:

    , , , , , , ,

    Canbecomputedefficientlyinparallelandturnsouttobean

    importantbuildingblockfordesigningparallelalgorithms

    Example:Operator:,input:, , 3, 1, 7, 0, 4, 1, 6, 3

    , ,

  • 8/12/2019 09 Parallel II 11 02 Ann

    12/24

    AlgorithmTheory,WS2013/14 FabianKuhn 12

    ComputingtheSum

    Letsfirst

    look

    at

    Parallelizeusingabinarytree:

  • 8/12/2019 09 Parallel II 11 02 Ann

    13/24

    AlgorithmTheory,WS2013/14 FabianKuhn 13

    ComputingtheSum

    Lemma:The

    sum

    canbecomputedin

    timelog onanEREWPRAM.Thetotalnumberof

    operations(totalwork)is.

    Proof:

    Corollary:Thesumcanbecomputedintime log using

    log processorsonanEREWPRAM.

    Proof:

    FollowsfromBrentstheorem( , log )

  • 8/12/2019 09 Parallel II 11 02 Ann

    14/24

  • 8/12/2019 09 Parallel II 11 02 Ann

    15/24

  • 8/12/2019 09 Parallel II 11 02 Ann

    16/24

    AlgorithmTheory,WS2013/14 FabianKuhn 16

    ComputingThePrefixSums

    Foreach

    node

    of

    the

    binary

    tree,

    define

    as

    follows:

    isthesumofthevaluesattheleavesinalltheleftsub

    treesofancestorsofsuchthatisintherightsubtreeof.

    Foraleafnode holdingvalue:

    Fortherootnode:

    Forallothernodes:

    istheleftchildof:

    istherightchildof:

    (hasleftchild)

    (:sumofvaluesin

    subtreeof)

  • 8/12/2019 09 Parallel II 11 02 Ann

    17/24

    AlgorithmTheory,WS2013/14 FabianKuhn 17

    ComputingThePrefixSums

    leafnode

    holding

    value

    :

    rootnode:

    Nodeistheleftchildof:

    Nodeis

    the

    right

    child

    of

    :

    Where: sumofvaluesinleftsubtreeof

    Algorithmtocomputevalues:

    1. Computesumofvaluesineachsubtree(bottomup)

    Canbedoneinparalleltime log with totalwork

    2. Computevalues topdownfromroottoleaves:

    Tocompute

    the

    value

    ,

    only

    of

    the

    parent

    and

    the

    sum

    of

    the

    leftsibling(ifisarightchild)areneeded

    Canbedoneinparalleltime log with totalwork

  • 8/12/2019 09 Parallel II 11 02 Ann

    18/24

  • 8/12/2019 09 Parallel II 11 02 Ann

    19/24

    AlgorithmTheory,WS2013/14 FabianKuhn 19

    ComputingPrefixSums

    Theorem:Given

    asequence

    , , ofvalues,allprefixsums

    (for1 )canbecomputedintimelog

    using log processorsonanEREWPRAM.

    Proof:

    Computingthesumsofallsubtreescanbedoneinparallelin

    time log using totaloperations.

    Thesame

    is

    true

    for

    the

    top

    down

    step

    to

    compute

    the

    ThetheoremthenfollowsfromBrentstheorem:

    , log

    Remark:Thiscanbeadaptedtootherparallelmodelsandto

    differentwaysofstoringthevalue(e.g.,arrayorlist)

  • 8/12/2019 09 Parallel II 11 02 Ann

    20/24

  • 8/12/2019 09 Parallel II 11 02 Ann

    21/24

  • 8/12/2019 09 Parallel II 11 02 Ann

    22/24

  • 8/12/2019 09 Parallel II 11 02 Ann

    23/24

    AlgorithmTheory,WS2013/14 FabianKuhn 23

    ApplyingtoQuicksort

    Theorem:On

    an

    EREW

    PRAM,

    using

    processors,

    randomized

    quicksortcanbeexecutedintime(inexpectationandwith

    highprobability),where

    log

    log .

    Proof:

    Remark:

    Wegetoptimal(linear)speedupw.r.t.tothesequential

    algorithm

    for

    all

    log .

  • 8/12/2019 09 Parallel II 11 02 Ann

    24/24

    AlgorithmTheory,WS2013/14 FabianKuhn 24

    OtherApplicationsofPrefixSums

    Prefixsums

    are

    avery

    powerful

    primitive

    to

    design

    parallel

    algorithms.

    Particularlyalsobyusingotheroperatorsthan+

    ExampleApplications:

    Lexicalcomparisonofstrings

    Addmultiprecisionnumbers

    Evaluatepolynomials

    Solverecurrences

    Radixsort/quicksort

    Searchfor

    regular

    expressions

    Implementsometreeoperations