lec04-topdownparser

download lec04-topdownparser

of 36

Transcript of lec04-topdownparser

  • 8/7/2019 lec04-topdownparser

    1/36

    1

    Top-Down Parsing

    The parse tree is created top to bottom.

    Top-down parser Recursive-Descent Parsing

    Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other

    alternatives.)

    It is a general parsing technique, but not widely used.

    Not efficient

    Predictive Parsing

    no backtracking

    efficient

    needs a special form of grammars (LL(1) grammars).

    Recursive Predictive Parsing is a special form of Recursive Descent parsing withoutbacktracking.

    Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.

  • 8/7/2019 lec04-topdownparser

    2/36

    2

    Recursive-Descent Parsing (uses Backtracking)

    Backtracking is needed.

    It tries to find the left-most derivation.

    Sp aBc

    Bp bc | b

    S S

    input: abc

    a B c a B c

    b c bfails, backtrack

  • 8/7/2019 lec04-topdownparser

    3/36

    3

    Recursive Predictive Parser

    a grammar a grammar suitable for predictive

    eliminate left parsing (a LL(1) grammar)

    left recursion factor no %100 guarantee.

    When re-writing a non-terminal in a derivation step, a predictive parsercan uniquely choose a production rule by just looking the currentsymbol in the input string.

    Ap E1 | ... | En input: ... a .......

    current token

  • 8/7/2019 lec04-topdownparser

    4/36

    4

    Recursive Predictive Parser (example)

    stmtp if exprthen stmt else stmt|

    while exprdo stmt|

    begin .stmt_list end

    When we are trying to write the non-terminal stmt, if the current tokenis if we have to choose first production rule.

    When we are trying to write the non-terminal stmt, we can uniquely

    choose the production rule by just looking the current token. We eliminate the left recursion in the grammar, and left factor it. But it

    may not be suitable for predictive parsing (not LL(1) grammar).

  • 8/7/2019 lec04-topdownparser

    5/36

    5

    Recursive Predictive Parsing

    Recursive Descent parsing without backtracking. Each non-terminal corresponds to a procedure.

    Ex: Ap aBb (This is only the production rule for A)

    proc A {- match the current token with a, and move to the next token;

    - call B;

    - match the current token with b, and move to the next token;

    }

  • 8/7/2019 lec04-topdownparser

    6/36

    6

    Recursive Predictive Parsing (cont.)

    Ap aBb | bAB

    proc A {

    case of the current token {

    a: - match the current token with a, and move to the next token;- call B;

    - match the current token with b, and move to the next token;

    b: - match the current token with b, and move to the next token;

    - call A;

    - call B;

    }

    }

  • 8/7/2019 lec04-topdownparser

    7/36

    7

    Recursive Predictive Parsing (cont.)

    When to apply I-productions.

    Ap aA | bB | I

    If all other productions fail, we should apply an I-production. For

    example, if the current token is not a or b, we may apply the

    I-production.

    Most correct choice: We should apply an I-production for a non-

    terminal A when the current token is in the follow set of A (whichterminals can follow A in the sentential forms).

  • 8/7/2019 lec04-topdownparser

    8/36

    8

    Recursive Predictive Parsing (Example)

    Ap aBe | cBd | C

    Bp bB | I

    Cp fproc C { match the current token with f,

    proc A { and move to the next token; }

    case of the current token {

    a: - match the current token with a,

    and move to the next token; proc B {

    - call B; case of the current token {

    - match the current token with e, b: - match the current token with b,

    and move to the next token; and move to the next token;

    c: - match the current token with c, - call B

    and move to the next token; e,d: do nothing

    - call B; }

    - match the current token with d, }

    and move to the next token;

    f: - call C

    }

    }

    follow set of B

    first set of C

  • 8/7/2019 lec04-topdownparser

    9/36

    9

    Non-Recursive Predictive Parsing -- LL(1) Parser

    Non-Recursive predictive parsing is a table-driven parser.

    It is a top-down parser.

    It is also known as LL(1) Parser.

    input buffer

    stack Non-recursive output

    Predictive Parser

    Parsing Table

  • 8/7/2019 lec04-topdownparser

    10/36

  • 8/7/2019 lec04-topdownparser

    11/36

    11

    LL(1) Parser

    input buffer our string to be parsed. We will assume that its end is marked with a special symbol $.

    output a production rule representing a step of the derivation sequence (left-most derivation) of the string in

    the input buffer.

    stack contains the grammar symbols

    at the bottom of the stack, there is a special end marker symbol $.

    initially the stack contains only the symbol $ and the starting symbol S. $S initial stack

    when the stack is emptied (ie. only $ left in the stack), the parsing is completed.

    parsing table

    a two-dimensional array M[A,a] each row is a non-terminal symbol

    each column is a terminal symbol or the special symbol $

    each entry holds a production rule.

  • 8/7/2019 lec04-topdownparser

    12/36

    12

    LL(1) Parser Parser Actions

    The symbol at the top of the stack (say X) and the current symbol in the input string(say a) determine the parser action.

    There are four possible parser actions.

    1. If X and a are $ parser halts (successful completion)

    2. If X and a are the same terminal symbol (different from $)

    parser pops X from the stack, and moves the next symbol in the input buffer.

    3. If X is a non-terminal

    parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production ruleXpY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. Theparser also outputs the production rule XpY1Y2...Ykto represent a step of the

    derivation.

    4. none of the above error all empty entries in the parsing table are errors.

    If X is a terminal symbol different from a, this is also an error case.

  • 8/7/2019 lec04-topdownparser

    13/36

    13

    LL(1) Parser Example1

    Sp aBa LL(1) Parsing

    Bp bB | I Table

    stack

    inpu

    t ou

    tpu

    t$S abba$ S p aBa

    $aBa abba$

    $aB bba$ B p bB

    $aBb bba$

    $aB ba$ B p bB

    $aBb ba$$aB a$ B p I

    $a a$

    $ $ accept, successful completion

    a b $

    S Sp aBa

    B Bp I Bp bB

  • 8/7/2019 lec04-topdownparser

    14/36

    14

    LL(1) Parser Example1 (cont.)

    Outputs: S p aBa B p bB B p bB B p I

    Derivation(left-most): SaBaabBaabbBaabba

    S

    Ba a

    B

    Bb

    b

    I

    parse tree

  • 8/7/2019 lec04-topdownparser

    15/36

    15

    Constructing LL(1) Parsing Tables

    Two functions are used in the construction of LL(1) parsing tables:

    FIRST FOLLOW

    FIRST(E) is a set of the terminal symbols which occur as first symbols in strings derived fromE where E is any string of grammar symbols.

    ifE derives to I, then I is also in FIRST(E) .

    FOLLOW(A) is the set of the terminals which occur immediately after (follow) the non-terminal A in the strings derived from the starting symbol.

    a terminal a is in FOLLOW(A) if S EAaF

    $ is in FOLLOW(A) if S EA

    Benefits of FIRST() and FOLLOW()

    can be used to prove the LL(k) characteristics of the grammar can be used to Aid in the construction of predictive parsing table

    Provides selection information for recursive decent parsers

    **

  • 8/7/2019 lec04-topdownparser

    16/36

    16

    Compute FIRST for Any String X

    If X is a terminal symbol FIRST(X)={X} If X is a non-terminal symbol and X p I is a production rule

    I is in FIRST(X).

    If X is a non-terminal symbol and X p Y1Y2..Yn is a production rule if a terminal a in FIRST(Yi) and I is in all FIRST(Yj) for j=1,...,i-1

    then a is in FIRST(X). ifI is in all FIRST(Yj) for j=1,...,n

    then I is in FIRST(X).

    If X is I FIRST(X)={I}

  • 8/7/2019 lec04-topdownparser

    17/36

    17

    FIRST Example

    Ep TE

    E p +TE | ITp FT

    T p *FT | IFp (E) | id

    FIRST(F) = {(,id} FIRST(TE) = {(,id}FIRST(T) = {*, I} FIRST(+TE ) = {+}FIRST(T) = {(,id} FIRST(I) = {I}FIRST(E) = {+, I} FIRST(FT) = {(,id}FIRST(E) = {(,id} FIRST(*FT) = {*}

    FIRST(I) = {I}FIRST((E)) = {(}FIRST(id) = {id}

  • 8/7/2019 lec04-topdownparser

    18/36

    18

    Compute FOLLOW (for non-terminals)

    If S is the start symbol $ is in FOLLOW(S)

    if A p EBF is a production rule

    everything in FIRST(F) is FOLLOW(B) except I

    If ( Ap EB is a production rule ) or

    ( Ap EBF is a production rule and I is in FIRST(F) )

    everything in FOLLOW(A) is in FOLLOW(B).

    We apply these rules until nothing more can be added to any follow set.

  • 8/7/2019 lec04-topdownparser

    19/36

    19

    FOLLOW Example

    Ep TE

    E p +TE | I

    Tp FT

    T p *FT | I

    Fp (E) | id

    FOLLOW(E) = { $, ) }

    FOLLOW(E) = { $, ) }

    FOLLOW(T) = { +, ), $ }

    FOLLOW(T) = { +, ), $ }

    FOLLOW(F) = {+, *, ), $ }

  • 8/7/2019 lec04-topdownparser

    20/36

    20

    Constructing LL(1) Parsing Table -- Algorithm

    for each production rule Ap E of a grammar G

    for each terminal a in FIRST(E)

    add A p E to M[A,a]

    IfI in FIRST(E)

    for each terminal a in FOLLOW(A) add A p E to M[A,a] IfI in FIRST(E) and $ in FOLLOW(A)

    add A p E to M[A,$]

    All other undefined entries of the parsing table are error entries.

  • 8/7/2019 lec04-topdownparser

    21/36

    21

    Constructing LL(1) Parsing Table -- Example

    Ep TE FIRST(TE)={(,id} Ep TE into M[E,(] and M[E,id]

    E p +TE FIRST(+TE )={+} E p +TE into M[E,+]

    E p I FIRST(I)={I} nonebut since I in FIRST(I)and FOLLOW(E)={$,)} E p I into M[E,$] and M[E,)]

    Tp FT FIRST(FT)={(,id} Tp FT into M[T,(] and M[T,id]

    T p *FT FIRST(*FT )={*} T p *FT into M[T,*]

    T p I FIRST(I)={I} nonebut since I in FIRST(I)

    and FOLLOW(T)={$,),+} T p I into M[T,$], M[T,)] and M[T,+]

    Fp (E) FIRST((E) )={(} Fp (E) into M[F,(]

    Fp id FIRST(id)={id} Fp id into M[F,id]

  • 8/7/2019 lec04-topdownparser

    22/36

    22

    LL(1) Parser Example2

    Ep TE

    Ep +TE | I

    Tp FT

    Tp *FT | I

    Fp (E) | id

    id + * ( ) $

    E Ep TE Ep TE

    E Ep +TE Ep I Ep I

    T Tp FT Tp FT

    T Tp I Tp *FT Tp I Tp I

    F Fp id Fp (E)

  • 8/7/2019 lec04-topdownparser

    23/36

    23

    LL(1) Parser Example2

    stack input output

    $E id+id$ Ep TE

    $ET id+id$ Tp FT

    $E TF id+id$ Fp id

    $ E Tid id+id$

    $ E

    T

    +id$ T

    p I$ E +id$ E p +TE

    $ E T+ +id$

    $ E T id$ Tp FT

    $ E T F id$ Fp id

    $ E Tid id$

    $ E T $ T p I

    $ E $ E p I

    $ $ accept

  • 8/7/2019 lec04-topdownparser

    24/36

    24

    LL(1) Grammars

    A grammar whose parsing table has no multiply-defined entries is said

    to be LL(1) grammar.

    one input symbol used as a look-head symbol do determine parser action

    LL(1) left most derivationinput scanned from left to right

    The parsing table of a grammar may contain more than one production

    rule. In this case, we say that it is not a LL(1) grammar.

  • 8/7/2019 lec04-topdownparser

    25/36

  • 8/7/2019 lec04-topdownparser

    26/36

  • 8/7/2019 lec04-topdownparser

    27/36

    27

    A Grammar which is not LL(1)

    Sp i C t S E | a FOLLOW(S) = { $,e }

    Ep e S | I FOLLOW(E) = { $,e }

    Cp b FOLLOW(C) = { t }

    FIRST(iCtSE) = {i}FIRST(a) = {a}

    FIRST(eS) = {e}

    FIRST(I) = {I}

    FIRST(b) = {b}

    two production rules for M[E,e]

    Problem ambiguity

    a b e i t $

    S Sp a Sp iCtSE

    E Ep e S

    Ep I

    Ep I

    C Cp b

  • 8/7/2019 lec04-topdownparser

    28/36

    28

    A Grammar which is not LL(1)

    Q.1 Consider the following Grammar, test whether the grammar is LL(1)

    or not, and construct a predictive parsing table for it.Sp AaAb |BbBa FIRST (S) = { a,b} FOLLOW(S) = { $}

    Ap I FIRST (A) = {I} FOLLOW (A) = {a,b}

    Bp I FIRST (B) = {I} FOLLOW(B) = {a,b}

    a b $S SpAaAb SpBbBa

    A Ap I Ap I

    B Bp I Bp I

  • 8/7/2019 lec04-topdownparser

    29/36

    29

    A Grammar which is not LL(1)

    Q.1 Consider the following Grammar, test whether the grammar is LL(1)

    or not, and construct a predictive parsing table for it.Sp 1AB | I FOLLOW(S) = { $}

    Ap 1AC | 0C FOLLOW(A) = { 1,0 }

    Bp 0S FOLLOW(B) = { $}

    Cp 1

    FIRST(S) = {1, I }

    FIRST(A) = {1,0}

    FIRST(B) = {0}

    FIRST(C) = {1}

    1 0 $

    S

    A

    B

    C

  • 8/7/2019 lec04-topdownparser

    30/36

    30

    A Grammar which is not LL(1) (cont.)

    What do we have to do it if the resulting parsing table contains multiply

    defined entries? If we didnt eliminate left recursion, eliminate the left recursion in the grammar.

    If the grammar is not left factored, we have to left factor the grammar.

    If its (new grammars) parsing table still contains multiply defined entries, that grammar is

    ambiguous or it is inherently not a LL(1) grammar.

    A left recursive grammar cannot be a LL(1) grammar. Ap AE | F

    any terminal that appears in FIRST(F) also appears FIRST(AE) because AE FE.

    IfF is I, any terminal that appears in FIRST(E) also appears in FIRST(AE) and FOLLOW(A).

    A grammar is not left factored, it cannot be a LL(1) grammar Ap EF1 | EF2

    any terminal that appears in FIRST(EF1) also appears in FIRST(EF2).

    An ambiguous grammar cannot be a LL(1) grammar.

  • 8/7/2019 lec04-topdownparser

    31/36

    31

    Properties of LL(1) Grammars

    A grammar G is LL(1) if and only if the following conditions hold for

    two distinctive production rules A p E and A p F

    1. Both E and F cannot derive strings starting with same terminals.

    2. At most one ofE and F can derive to I.

    3. IfF can derive to I, then E cannot derive to any string starting

    with a terminal in FOLLOW(A).

  • 8/7/2019 lec04-topdownparser

    32/36

    32

    Error Recovery in Predictive Parsing

    An error may occur in the predictive parsing (LL(1) parsing)

    if the terminal symbol on the top of stack does not match with

    the current input symbol.

    if the top of stack is a non-terminal A, the current input symbol is a,

    and the parsing table entry M[A,a] is empty. What should the parser do in an error case?

    The parser should be able to give an error message (as much as

    possible meaningful error message).

    It should be recover from that error case, and it should be able

    to continue the parsing with the rest of the input.

  • 8/7/2019 lec04-topdownparser

    33/36

    33

    Error Recovery Techniques

    Panic-Mode Error Recovery Skipping the input symbols until a synchronizing token is found.

    Phrase-Level Error Recovery Each empty entry in the parsing table is filled with a pointer to a specific error routine to

    take care that error case.

    Error-Productions If we have a good idea of the common errors that might be encountered, we can augment

    the grammar with productions that generate erroneous constructs.

    When an error production is used by the parser, we can generate appropriate errordiagnostics.

    Since it is almost impossible to know all the errors that can be made by the programmers,this method is not practical.

    Global-Correction Ideally, we we would like a compiler to make as few change as possible in processing

    incorrect inputs.

    We have to globally analyze the input to find the error.

    This is an expensive method, and it is not in practice.

  • 8/7/2019 lec04-topdownparser

    34/36

    34

    Panic-Mode Error Recovery in LL(1) Parsing

    In panic-mode error recovery, we skip all the input symbols until a

    synchronizing token is found.

    What is the synchronizing token? All the terminal-symbols in the follow set of a non-terminal can be used as a synchronizing

    token set for that non-terminal.

    So, a simple panic-mode error recovery for the LL(1) parsing: All the empty entries are marked as synch to indicate that the parser will skip all the input

    symbols until a symbol in the follow set of the non-terminal A which on the top of the

    stack. Then the parser will pop that non-terminal A from the stack. The parsing continues

    from that state.

    To handle unmatched terminal symbols, the parser pops that unmatched terminal symbol

    from the stack and it issues an error message saying that that unmatched terminal is

    inserted.

  • 8/7/2019 lec04-topdownparser

    35/36

    35

    Panic-Mode Error Recovery - Example

    Sp AbS | e | I

    Ap a | cAd

    FOLLOW(S)={$}

    FOLLOW(A)={b,d}

    stack input output stack input output

    $S aab$ Sp AbS $S ceadb$ S p AbS

    $SbA aab$ Ap a $SbA ceadb$ Ap cAd

    $Sba aab$ $SbdAc ceadb$

    $Sb ab$ Error: missing b, inserted $SbdA eadb$ Error:unexpected e (illegal A)

    $S ab$ Sp AbS (Remove all input tokens until first b or d, pop A)

    $SbA ab$ Ap a $Sbd db$

    $Sba ab$ $Sb b$

    $Sb b$ $S $ S p I

    $S $ Sp I $ $ accept

    $ $ accept

    a b c d e $S SpAbS sync SpAbS sync Sp e Sp I

    A Ap a sync Ap cAd sync sync sync

  • 8/7/2019 lec04-topdownparser

    36/36

    36

    Phrase-Level Error Recovery

    Each empty entry in the parsing table is filled with a pointer to a special

    error routine which will take care that error case.

    These error routines may: change, insert, or delete input symbols.

    issue appropriate error messages

    pop items from the stack.

    We should be careful when we design these error routines, because we

    may put the parser into an infinite loop.