MELJUN CORTES Automata Theory (Automata12)
-
Upload
meljun-cortes -
Category
Technology
-
view
203 -
download
1
Transcript of MELJUN CORTES Automata Theory (Automata12)
CSC 3130: Automata theory and formal languages
Parsers for programming languages
MELJUN P. CORTES, MBA,MPA,BSCS,ACSMELJUN P. CORTES, MBA,MPA,BSCS,ACS
MELJUN CORTESMELJUN CORTES
CFG of the java programming language
Identifier:IDENTIFIER
QualifiedIdentifier:Identifier { . Identifier }
Literal:IntegerLiteral FloatingPointLiteral CharacterLiteral StringLiteral BooleanLiteralNullLiteral
Expression: Expression1 [AssignmentOperator Expression1]]
AssignmentOperator: = += -= *= /= &= |=
from http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html#52996
…
Parsing java programsclass Point2d { /* The X and Y coordinates of the point--instance variables */ private double x; private double y; private boolean debug; // A trick to help with debugging
public Point2d (double px, double py) { // Constructorx = px;y = py;
debug = false; // turn off debugging }
public Point2d () { // Default constructorthis (0.0, 0.0); // Invokes 2 parameter Point2D constructor
} // Note that a this() invocation must be the BEGINNING of // statement body of constructor
public Point2d (Point2d pt) { // Another consructorx = pt.getX();y = pt.getY();
}
}…
Simple java program: about 1000 symbols
Parsing algorithms• How long would it take to parse this?
• Can we parse faster?• No! CYK is the fastest known general-
purposeparsing algorithm
exhaustive algorithm about 1080 years(longer than life of universe)
CYK algorithm about 1 week!
Another way of thinking
Scientist: Find an algorithm thatcan parse strings inany grammar
Engineer: Design your grammar so it has a very fastparsing algorithm
An exampleS Tc(1)
T TA(2) | A(3)
A aTb(4) | ab(5)
input: abaabbc
Stack InputaabATTaTaaTaabTaATaTTaTbTATTcS
abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbcccc
Actionshiftshiftreduce (5)reduce (3)shiftshiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1) aa bb
A
a b
A
c
TT
TA
S
Items
S •TcS T•cS Tc•
T •TAT T•AT TA•
T •AT A•
A •aTbA a•TbA aT•bA aTb•
A •abA a•bA ab•
S Tc(1) T A(3)T TA(2) A aTb(4)A ab(5)
Stack InputaabATTa
abaabbcbaabbcaabbcaabbcaabbcabbc
Actionshiftshiftreduce (5)reduce (3)shiftshift
••••••
Idea of parsing algorithm:Try to match complete items to top of stack
Some terminologyS Tc(1)
T TA(2) | A(3)
A aTb(4) | ab(5)
input: abaabbc
Stack InputaabATTaTaaTaabTaATaTTaTbTATTcS
abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbcccc
Actionshiftshiftreduce (5)reduce (3)shift shiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1)
handle
valid items:a•Tb, a•b
valid items:T•a, T•c, aT•b
Outline of LR(0) parsing algorithm• As the string is being read, it is pushed on a
stack
• Algorithm keeps track of all valid items
• Algorithm can perform two actions:no complete
itemis viable
shift reduce
there is one valid item,and it is complete
Running the algorithmStack Input
S
S
SRSR
a
aa
aabaAaAbA
aabbabb
bb
bb
A Valid ItemsA •aAb A •ab A a•Ab A a•bA •aAb A •ab A a•Ab A a•bA •aAb A •abA ab•A aA•bA aAb•
A aAb | ab A aAb aabb
Running the algorithmStack Input
S
S
SRSR
a
aa
aabaAaAbA
aabbabb
bb
bb
A Valid ItemsA •aAb A •ab A a•Ab A a•bA •aAb A •ab A a•Ab A a•bA •aAb A •abA ab•A aA•bA aAb•
A aAb | ab A aAb aabb
How to update viable items• Initial set of valid items
• Updating valid items on “shift b”
– After these updates, for every valid item A •C andproduction C •, we also add
as a valid item
S • for every production S
A •b A b•is updated toA •X disappears if X ≠ b
C •a, b: terminalsA, B: variablesX, Y: mixed symbols: mixed strings
notation
How to update viable items• Updating valid items on “reduce to B”
– First, we backtrack to viable items before reduce
– Then, we apply same rules as for “shift B” (as if B were a terminal)A •B A B•is updated to
A •X disappears if X ≠ B
C • is added for every valid item A •C and production C •
Viable item updates by NFA• States of NFA will be items (plus a start state q0)• For every item S •we have a transition
• For every item A •X we have a transition
• For every item A •C and production C •
S •q0
A X•XA •X
C •A •C
Example
A aAb | ab
A •aAb A a•Ab A aA•b
A aAb•
A •ab A a•b A ab•
q0
a
a b
b
A
Convert NFA to DFA
A •aAbA •ab
A a•AbA a•bA •aAbA •ab
A aA•b
A aAb•
A ab•
a
b
bAa
12
3
4
5
states correspond to sets of valid itemstransitions are labeled by variables / terminals
die
Attempt at parsing with DFAStack Input
S
S
SR
a
aa
aabaA
aabbabb
bb
bb
A DFA stateA •aAb A •ab A a•Ab A a•bA •aAb A •ab A a•Ab A a•bA •aAb A •abA ab•A aA•b
A aAb | ab A aAb aabb
12
2
3?
Remember the state in stack!Stack Input
S
S
SRSR
11a2
1a2a2
1a2a2b31a2A41a2A4b51A
aabbabb
bb
bb
A DFA stateA •aAb A •ab A a•Ab A a•bA •aAb A •ab A a•Ab A a•bA •aAb A •abA ab•A aA•bA aAb•
A aAb | ab A aAb aabb
12
2
345
LR(0) grammars and deterministic PDAs• The parsing procedure can be implemented by a
deterministic pushdown automaton
• A PDA is deterministic if in every state there is atmost one possible transition – for every input symbol and pop symbol, including
• Example: PDA for w#wR is deterministic, but PDA forwwR is not
LR(0) grammars and deterministic PDAs• Not every PDA can be made deterministic
• Since PDAs are equivalent to CFLs, LR(0) parsing algorithm must fail for some CFLs!
• When does LR(0) parsing algorithm fail?
Outline of LR(0) parsing algorithm• Algorithm can perform two actions:
• What if:
no complete item
is valid
there is one valid item,and it is complete
shift (S) reduce (R)
some valid items
complete, some not
more than one valid
complete itemS / R conflict R / R conflict
context-free grammarsparse using CYK algorithm (slow)
LR(∞) grammars…
Hierarchy of context-free grammars
LR(1) grammarsLR(0) grammarsparse using LR(0) algorithm
javaperl
python…
to be continued…