Predictive Parsing
The goal of predictive parsing is to construct a top-down parser that never backtracks. To do so, we must transform a grammar in two ways:
- eliminate left recursion, and
- perform left factoring.
These rules eliminate most common causes for backtracking although they do not guarantee a completely backtrack-free parsing (called LL(1) as we will see later).
Consider this grammar:
A ::= A a
| b
It recognizes the regular expression ba*. The problem is that if we use the first production for top-down derivation, we will fall into an infinite derivation chain. This is called left recursion. But how else can you express ba*? Here is an alternative way:
A ::= b A'
A' ::= a A'
|
where the third production is an empty production (ie, it is A’ ::= ). That is, A’ parses the RE a*. Even though this CFG is recursive, it is not left recursive. In general, for each nonterminal X, we partition the productions for X into two groups: one that contains the left recursive productions, and the other with the rest. Suppose that the first group is:
X ::= X a1
...
X ::= X an
while the second group is:
X ::= b1
...
X ::= bm
where a, b are symbol sequences. Then we eliminate the left recursion by rewriting these rules into:
X ::= b1 X'
...
X ::= bm X'
X' ::= a1 X'
...
X' ::= an X'
X' ::=
For example, the CFG G1 is transformed into:
E ::= T E'
E' ::= + T E'
| - T E'
|
T ::= F T'
T' ::= * F T'
| / F T'
|
F ::= num
| id
Suppose now that we have a number of productions for X that have a common prefix in their rhs (but without any left recursion):
X ::= a b1
...
X ::= a bn
We factor out the common prefix as follows:
X ::= a X'
X' ::= b1
...
X' ::= bn
This is called left factoring and it helps predict which rule to use without backtracking. For example, the rule from our right associative grammar G2:
E ::= T + E
| T - E
| T
is translated into:
E ::= T E'
E' ::= + E
| - E
|
As another example, let L be the language of all regular expressions over the alphabet
= {a, b}. That is, L = {“
“,“a“,“b“,“a*”,“b*”,“a| b“,“(a| b)”,…}. For example, the string “a(a| b)*| b*” is a member of L. There is no RE that captures the syntax of all REs. Consider for example the RE (( … (a) … )), which is equivalent to the language (na)n for all n. This represents a valid RE but there is no RE that can capture its syntax. A context-free grammar that recognizes L is:
|
R |
: : = |
R R |
|
|
| |
R “|” R |
|
|
| |
R * |
|
|
| |
( R ) |
|
|
| |
a |
|
|
| |
b |
|
|
| |
“ |
after elimination of left recursion, this grammar becomes:
|
R |
: : = |
( R ) R’ |
|
|
| |
a R’ |
|
|
| |
b R’ |
|
|
| |
“ |
|
R’ |
: : = |
R R’ |
|
|
| |
“|” R R’ |
|
|
| |
* R’ |
|
|
| |
|
Posted in Computer Science, Information Technology, Compiler Design, Compiler Design |
