Browse Prior Art Database

Parsing for Structured Editors

IP.com Disclosure Number: IPCOM000051480D
Original Publication Date: 1981-Jan-01
Included in the Prior Art Database: 2005-Feb-10
Document File: 3 page(s) / 17K

Publishing Venue

IBM

Related People

Wegman, M: AUTHOR

Abstract

The algorithm described herein shows how to re-parse a string after another string has been inserted into it in time within a log factor of the optimally possible, under certain restrictions.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 33% of the total text.

Page 1 of 3

Parsing for Structured Editors

The algorithm described herein shows how to re-parse a string after another string has been inserted into it in time within a log factor of the optimally possible, under certain restrictions.

First, shift-reduce parsing will be reviewed. A shift-reduce parser has a stack and an input. The symbols in the input are called terminals. Non-terminals are the symbols which are the left hand sides of the productions of the grammar. The symbols on the stack may be thought of as having two parts: (1) a terminal or non-terminal and (2) some extra state information. Depending on the symbol on the top of the stack and the k look-ahead symbols at the beginning of the input, the parser either shifts or reduces. By shifting, it is meant that the first character of the input is taken away from the input and placed on the stack along with some extra state information. That state information is determined by the same information used to choose between shifting or reducing. By reducing is meant that several symbols are taken off the top of the stack and replaced by a single non-terminal, which is the left hand side of a production whose right hand side is the symbols taken off the stack. The production is determined in the same way the choice between shifting or reducing is made.

However, while this information is sufficient to determine what non-terminal those symbols reduce to, it is not sufficient to determine the extra state information. To do that, the symbol remaining on the top of the stack must be examined after the other symbols are removed. It is considered that this reduce consists of two operations: (1) the symbols are taken of the stack, and a non- terminal is placed in the front of the input, and (2) that non-terminal is shifted back onto the stack as if it were part of the input. The action of shifting a symbol and placing it plus the extra state information on the stack is sometimes referred to as reading the symbol. It thus becomes possible to read non-terminals as well as terminals.

The parser must be able to skip steps and yet avoid creating erroneous parse trees. LR (bottom-up) parsers usually have the property that they cannot read past an error. Another way of describing this is that if an LR parser has read up to a certain point and has not declared an error, then the string of symbols it has read form a prefix of some valid string in the language. It is desired to extend this inability to read past an error to the reading of a non-terminal. There must be several ways of parsing the prefix of a valid string, and the look-ahead is used to tell the parser which it should choose. Thus, it may be that there are states in the parser which cannot read a non-terminal when a certain terminal symbol is the symbol beyond that non-terminal. This may be because, with that look-ahead symbol, the non-terminal to be read could never be produced. It is assumed herein that such a read causes an error, even though...