Browse Prior Art Database

Profile driven approach for Efficient parsing of Ambiguous Grammar

IP.com Disclosure Number: IPCOM000191310D
Original Publication Date: 2009-Dec-29
Included in the Prior Art Database: 2009-Dec-29
Document File: 4 page(s) / 81K

Publishing Venue

IBM

Abstract

With the popularity of new Dynamic scripting Languages like PhP, Ruby etc, and the freedom the languages give to the programmers in terms of the grammar and usage, language grammars are tending towards being more ambiguous and non-deterministic. A context-free grammar that derives the same word by different derivation trees, or equivalently by different derivation sequences is said to be ambiguous. Inherently Ambiguous Languages are those which have a grammar which cannot be converted into an unambiguous notation. Ambiguous grammar can no longer be parsed by the deterministic parsers without having any shift-reduce or reduce-reduce conflicts. Presence of these conflicts indicates the possibility of misinterpreting certain sequence of operations as a totally different sequence.. This is mainly because for a non-deterministic and ambiguous grammar there are multiple outgoing states for the same set of inputs. There are quite a few methods to handle this. The most popular ones being a> Specifying associativity and precedence in the grammar. b> Following some sort of backtracking algorithm. >>>> http://contraintes.inria.fr/~ddr/camlp5/doc/htmlc/bparsers.html

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 4

Profile driven approach for Efficient parsing of Ambiguous Grammar

A grammar is said to be ambiguous if some string in the language can be generated in more than one way. This implies that it has more than one parse tree representation. The following grammar is said to be ambiguous since it can generate 2

parse tree representation.

A Finite State Machine (FSM) representation of a grammar would have

A start state (A filled Circle) .

A set of transitions from one defined state to another represented by an edge. The edge is associated with an input that leads to a state transition. (An arrow)

A definite set of final or accepting states. (A circle within another)

An ambiguous grammar could have multiple outgoing edge corresponding to a single state and a single input sequence. So at any stage there is lack of clarity as to which edge needs to be taken for a particular input unless the parser is aware of what are the subsequent inputs.

For example,


Consider a language with the following grammar
E -> X Y
X -> a b | a
Y -> b
where X and Y are non-terminals and a,b are terminals. Clearly the presence of 2 alternate outgoing edge from the start state for the same input results in ambiguity.

b

a

a

Let us now assume that the input is "a b". If we assume no prioritization of the outgoing paths, then for the given input and the language specification we reduce X to "a b" to enter the final state and then note that Y cannot be reduced which results in Error state. So we need to get back to the previous safe state, which happens to be the start start in this case and then take the alternate outgoing path which reduces X to "a". Then we begin reducing Y into "b" (which is unambiguous) and this results in the proper parsing. Here, we noted that we had to do a

1

arg -> arg + arg | arg - arg | a


If the input string is a + a - a then it can easily be visualized that the above grammar can help generate 2

parse tree representation.

Now let us look at the graphical representation of any grammar.

Page 2 of 4

backtracking to the last safe state and then re-trigger processing on the alternate outgoing path to

continue pars...