Browse Prior Art Database

Table-Driven Parsing Algorithm

IP.com Disclosure Number: IPCOM000100494D
Original Publication Date: 1990-Apr-01
Included in the Prior Art Database: 2005-Mar-15
Document File: 8 page(s) / 277K

Publishing Venue

IBM

Related People

Carlock, JR: AUTHOR [+3]

Abstract

A method is disclosed for processing strings of tokens, called terminals, and composite tokens called non-terminals. This method is table driven; a change in the grammar syntax requires, at most, minor changes in executable program code.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 27% of the total text.

Table-Driven Parsing Algorithm

       A method is disclosed for processing strings of tokens,
called terminals, and composite tokens called non-terminals. This
method is table driven; a change in the grammar syntax requires, at
most, minor changes in executable program code.

      A table-switching mechanism is at the heart of this disclosure.
 Without this switching mechanism, a syntax must be implemented with
one (possibly huge) parsing table. Using a set of smaller tables
eases implementation in a memory-constrained system, where tables,
other than the current one, could be kept on external storage media
and read into memory only when required.

      The particular arrangement of information into parsing cells is
also a point of disclosure.

      This invention relates to computer programs that must accept
and process data that is expressed as a "grammar," or Backus-Naur
form description (BNF).  Compilers and text processors are examples
of such programs.  The algorithm described can be used to parse any
syntax that can be expressed with BNF.

      A method is required for reading a string of tokens, verifying
that the sequence of the tokens conforms to a defined syntax,
executing an action based on each token and the sequence of
previously-received tokens, and recognizing when a sequence of tokens
was in fact a non- terminal - a sequence of tokens that can be
logically treated as one token.

      An algorithm is presented which recognizes valid and invalid
strings in a syntax, and drive program components to act on the
terminals and non-terminals received.  The algorithm is based on two
logical parts; a driver routine called the Parsing Driver, and a set
of tables called Parsing Tables.

      The key steps in building the parsing mechanism are:
           1.   Choose non-terminal and string delimiters.
           2.   Draw state machines.
           3.   Build parsing tables.

      The end of a complete string, or of a non-terminal, can be
detected in a number of ways.  The simplest approach is to add
special symbols to the syntax to indicate the end of a string and the
end of a non-terminal.  However, the addition of special symbols for
this purpose is not required by the algorithm.

      Fig. 1 shows an example grammar G.  Note that an 'OendJ' token
"/" indicates the end of a string in G.  The 'OsubendJ' token ';'
indicates the end of a non-terminal that is embedded in another
string, such as 'uuuu' in 'abcuuuux.'

      Some valid string in G are:
           a;/
           a;u;/
           a;u;x;/
           a;y;/
           abc;uuuu;x;/
           abc;uuuuvwvw;zxyy;/

      String parsing in computer programs is usually done with a
finite state machine construct.  A state machine for recognizing a
string in G is illustrated in Fig. 2.

      While in S(0) state, the machine...