Browse Prior Art Database

Parsing or Tokenizing Tables Using Finite State Machines to Direct Program Actions As Well As Program Control Flow

IP.com Disclosure Number: IPCOM000034214D
Original Publication Date: 1989-Jan-01
Included in the Prior Art Database: 2005-Jan-27
Document File: 4 page(s) / 41K

Publishing Venue

IBM

Related People

Meissner, GL: AUTHOR

Abstract

This algorithm is used to implement a tokenizing routine and a parsing routine. A tokenizing routine is one that scans a line of input (usually text or commands) and breaks it up into generic words. For example, it may know what numbers are, what special characters are, etc. A parsing routine is one that interprets the words a tokenizing routine returns in a given context to extract meaning and derive a complete command. It will normally return this information to the caller who will execute the command requested. This design is based upon the finite state machine concept, but extends it slightly by using two tables together to describe the state machine flow in one table as well as the specific parsing/tokenizing actions to be taken with the data in the other table.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 33% of the total text.

Page 1 of 4

Parsing or Tokenizing Tables Using Finite State Machines to Direct Program Actions As Well As Program Control Flow

This algorithm is used to implement a tokenizing routine and a parsing routine. A tokenizing routine is one that scans a line of input (usually text or commands) and breaks it up into generic words.

For example, it may know what numbers are, what special characters are, etc. A parsing routine is one that interprets the words a tokenizing routine returns in a given context to extract meaning and derive a complete command. It will normally return this information to the caller who will execute the command requested. This design is based upon the finite state machine concept, but extends it slightly by using two tables together to describe the state machine flow in one table as well as the specific parsing/tokenizing actions to be taken with the data in the other table. Discussion of Method Tokenizing routines break a command line up into words and identify the type of word to the calling routine, which is usually a parsing routine. In many widely different command languages the token types used may be similar. For example, a token could be a complete word (INSERT, DELETE, or ALL), a number (345, -45), a special set of characters (=, +, *), or any other word that is useful in the particular syntax of the language being deciphered. Parsers are less general. They understand the language being parsed at a syntactic level. While a tokenizer knows how to identify different words, the parser insures that the words made sense in combination. A good analogy would be that a tokenizer is a spelling checker, but a parser checks that the construction of a sentence makes sense. In either case a finite state machine is a good way to implement the routine. Normally, a finite state machine can be used to check the validity of a statement in the command language. With a small addition the finite state machine can be used to decode, store, or perform some action or function on the information from the command as it is parsed. It does this by allowing the programmer to code specific actions (subroutines) that will be executed as the program moves through the state diagram. In this way it is possible for a parsing routine or a tokenizing routine to store intermediate results and do other useful work as the commands are read. One use for this is to have the parser fill in the fields of a control block with the information as it is parsed. Once the parsing is complete, not only has the statement been validated from a syntactic point of view, but it has also been pulled apart and the information inside placed into a convenient form for the caller. All this has the advantage of making the code simpler and of checkpointing the results during processing so that possible error detection and program debugging become easier. It also tends to uncomplicate each subroutine, as there is no other work to do once the single pass of the parser is complete....