MELJUN CORTES Automata Theory (Automata12)

22
CSC 3130: Automata theory and formal languages Parsers for programming languages MELJUN P. CORTES, MBA,MPA,BSCS,ACS MELJUN P. CORTES, MBA,MPA,BSCS,ACS MELJUN CORTES MELJUN CORTES

Transcript of MELJUN CORTES Automata Theory (Automata12)

Page 1: MELJUN CORTES Automata Theory (Automata12)

CSC 3130: Automata theory and formal languages

Parsers for programming languages

MELJUN P. CORTES, MBA,MPA,BSCS,ACSMELJUN P. CORTES, MBA,MPA,BSCS,ACS

MELJUN CORTESMELJUN CORTES

Page 2: MELJUN CORTES Automata Theory (Automata12)

CFG of the java programming language

Identifier:IDENTIFIER

QualifiedIdentifier:Identifier { . Identifier }

Literal:IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteralNullLiteral

Expression: Expression1 [AssignmentOperator Expression1]]

AssignmentOperator: = += -= *= /= &= |=

from http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html#52996

Page 3: MELJUN CORTES Automata Theory (Automata12)

Parsing java programsclass Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging

public Point2d (double px, double py) { // Constructorx = px;y = py;

debug = false; // turn off debugging }

public Point2d () { // Default constructorthis (0.0, 0.0); // Invokes 2 parameter Point2D constructor

} // Note that a this() invocation must be the BEGINNING of // statement body of constructor

public Point2d (Point2d pt) { // Another consructorx = pt.getX();y = pt.getY();

}

}…

Simple java program: about 1000 symbols

Page 4: MELJUN CORTES Automata Theory (Automata12)

Parsing algorithms• How long would it take to parse this?

• Can we parse faster?• No! CYK is the fastest known general-

purposeparsing algorithm

exhaustive algorithm about 1080 years(longer than life of universe)

CYK algorithm about 1 week!

Page 5: MELJUN CORTES Automata Theory (Automata12)

Another way of thinking

Scientist: Find an algorithm thatcan parse strings inany grammar

Engineer: Design your grammar so it has a very fastparsing algorithm

Page 6: MELJUN CORTES Automata Theory (Automata12)

An exampleS Tc(1)

T TA(2) | A(3)

A aTb(4) | ab(5)

input: abaabbc

Stack InputaabATTaTaaTaabTaATaTTaTbTATTcS

abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbcccc

Actionshiftshiftreduce (5)reduce (3)shiftshiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1) aa bb

A

a b

A

c

TT

TA

S

Page 7: MELJUN CORTES Automata Theory (Automata12)

Items

S •TcS T•cS Tc•

T •TAT T•AT TA•

T •AT A•

A •aTbA a•TbA aT•bA aTb•

A •abA a•bA ab•

S Tc(1) T A(3)T TA(2) A aTb(4)A ab(5)

Stack InputaabATTa

abaabbcbaabbcaabbcaabbcaabbcabbc

Actionshiftshiftreduce (5)reduce (3)shiftshift

••••••

Idea of parsing algorithm:Try to match complete items to top of stack

Page 8: MELJUN CORTES Automata Theory (Automata12)

Some terminologyS Tc(1)

T TA(2) | A(3)

A aTb(4) | ab(5)

input: abaabbc

Stack InputaabATTaTaaTaabTaATaTTaTbTATTcS

abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbcccc

Actionshiftshiftreduce (5)reduce (3)shift shiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1)

handle

valid items:a•Tb, a•b

valid items:T•a, T•c, aT•b

Page 9: MELJUN CORTES Automata Theory (Automata12)

Outline of LR(0) parsing algorithm• As the string is being read, it is pushed on a

stack

• Algorithm keeps track of all valid items

• Algorithm can perform two actions:no complete

itemis viable

shift reduce

there is one valid item,and it is complete

Page 10: MELJUN CORTES Automata Theory (Automata12)

Running the algorithmStack Input

S

S

SRSR

a

aa

aabaAaAbA

aabbabb

bb

bb

A Valid ItemsA •aAb A •ab A a•Ab A a•bA •aAb A •ab A a•Ab A a•bA •aAb A •abA ab•A aA•bA aAb•

A aAb | ab A aAb aabb

Page 11: MELJUN CORTES Automata Theory (Automata12)

Running the algorithmStack Input

S

S

SRSR

a

aa

aabaAaAbA

aabbabb

bb

bb

A Valid ItemsA •aAb A •ab A a•Ab A a•bA •aAb A •ab A a•Ab A a•bA •aAb A •abA ab•A aA•bA aAb•

A aAb | ab A aAb aabb

Page 12: MELJUN CORTES Automata Theory (Automata12)

How to update viable items• Initial set of valid items

• Updating valid items on “shift b”

– After these updates, for every valid item A •C andproduction C •, we also add

as a valid item

S • for every production S

A •b A b•is updated toA •X disappears if X ≠ b

C •a, b: terminalsA, B: variablesX, Y: mixed symbols: mixed strings

notation

Page 13: MELJUN CORTES Automata Theory (Automata12)

How to update viable items• Updating valid items on “reduce to B”

– First, we backtrack to viable items before reduce

– Then, we apply same rules as for “shift B” (as if B were a terminal)A •B A B•is updated to

A •X disappears if X ≠ B

C • is added for every valid item A •C and production C •

Page 14: MELJUN CORTES Automata Theory (Automata12)

Viable item updates by NFA• States of NFA will be items (plus a start state q0)• For every item S •we have a transition

• For every item A •X we have a transition

• For every item A •C and production C •

S •q0

A X•XA •X

C •A •C

Page 15: MELJUN CORTES Automata Theory (Automata12)

Example

A aAb | ab

A •aAb A a•Ab A aA•b

A aAb•

A •ab A a•b A ab•

q0

a

a b

b

A

Page 16: MELJUN CORTES Automata Theory (Automata12)

Convert NFA to DFA

A •aAbA •ab

A a•AbA a•bA •aAbA •ab

A aA•b

A aAb•

A ab•

a

b

bAa

12

3

4

5

states correspond to sets of valid itemstransitions are labeled by variables / terminals

die

Page 17: MELJUN CORTES Automata Theory (Automata12)

Attempt at parsing with DFAStack Input

S

S

SR

a

aa

aabaA

aabbabb

bb

bb

A DFA stateA •aAb A •ab A a•Ab A a•bA •aAb A •ab A a•Ab A a•bA •aAb A •abA ab•A aA•b

A aAb | ab A aAb aabb

12

2

3?

Page 18: MELJUN CORTES Automata Theory (Automata12)

Remember the state in stack!Stack Input

S

S

SRSR

11a2

1a2a2

1a2a2b31a2A41a2A4b51A

aabbabb

bb

bb

A DFA stateA •aAb A •ab A a•Ab A a•bA •aAb A •ab A a•Ab A a•bA •aAb A •abA ab•A aA•bA aAb•

A aAb | ab A aAb aabb

12

2

345

Page 19: MELJUN CORTES Automata Theory (Automata12)

LR(0) grammars and deterministic PDAs• The parsing procedure can be implemented by a

deterministic pushdown automaton

• A PDA is deterministic if in every state there is atmost one possible transition – for every input symbol and pop symbol, including

• Example: PDA for w#wR is deterministic, but PDA forwwR is not

Page 20: MELJUN CORTES Automata Theory (Automata12)

LR(0) grammars and deterministic PDAs• Not every PDA can be made deterministic

• Since PDAs are equivalent to CFLs, LR(0) parsing algorithm must fail for some CFLs!

• When does LR(0) parsing algorithm fail?

Page 21: MELJUN CORTES Automata Theory (Automata12)

Outline of LR(0) parsing algorithm• Algorithm can perform two actions:

• What if:

no complete item

is valid

there is one valid item,and it is complete

shift (S) reduce (R)

some valid items

complete, some not

more than one valid

complete itemS / R conflict R / R conflict

Page 22: MELJUN CORTES Automata Theory (Automata12)

context-free grammarsparse using CYK algorithm (slow)

LR(∞) grammars…

Hierarchy of context-free grammars

LR(1) grammarsLR(0) grammarsparse using LR(0) algorithm

javaperl

python…

to be continued…