Browse Prior Art Database

Single-Threaded Buffer Management Method for Lexical Analyzers

IP.com Disclosure Number: IPCOM000113925D
Original Publication Date: 1994-Oct-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 108K

Publishing Venue

IBM

Related People

Feriozi, DT: AUTHOR

Abstract

Disclosed is the use, in a parsing application within a single-thread environment such a PC-DOS, of a single buffer having an incomplete lexeme string copied to the beginning of the buffer, before data is read in the buffer starting at the end of the lexeme string.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Single-Threaded Buffer Management Method for Lexical Analyzers

      Disclosed is the use, in a parsing application within a
single-thread environment such a PC-DOS, of a single buffer having an
incomplete lexeme string copied to the beginning of the buffer,
before data is read in the buffer starting at the end of the lexeme
string.

      Fig. 1 is a block diagram of a typical parsing application,
such as a compiler or natural language translator, which is dependent
on a lexical analysis of a raw input stream of characters to produce
a stream of tokens to be used by the parser 10.  Since the lexical
analyzer 12 deals with the input at its lowest level, this analyzer
12 must be as efficient as possible.  A very efficient lexical
analyzer will noticeably improve the overall throughput of the
parsing application.  An inefficient lexical analyzer forms a
performance bottleneck that cannot be overcome, even as the best
parsing algorithms available are used.  It is for this reason that
much of the performance oriented research related to parsing
applications centers on the lexical analysis phase of the parser.
For best performance, low level I/0 buffer management is required by
the lexical analyzer.

      Fig. 2 is a block diagram showing the most common method used
to achieve I/O buffer management.  A pair of contiguous buffers 14
and 16 are used along with start and end lexeme pointers 18 and 20.
These two buffers are filled alternately, one after the other,
switching between them as the end of the previous buffer is reached.
The end-of-lexeme pointer may wrap around from the second buffer to
the first one, causing the lexeme to consist of two disjointed parts
that are difficult to manipulate in comparison or copy operations.
Providing easy manipulation is important because about half of the
lexemes encountered by a compiler are identifiers that must be
compared with a symbol table entry, or possibly that must be saved in
the symbol table.

      If a single buffer is used, there must be some way to handle
the case where a lexeme crosses the end of the buffer.  The presently
disclosed approach is to simply copy the incomplete lexeme string to
the beginning of the buffer, and then to read in the next buffer of
information, starting at the end of that lexeme string.  This method
requires an overflow area at the end of the buffer that is large
enough to handle the largest anticipated lexeme.  Thus, when the end
of the current buffer is reached, a current incomplete lexeme is
copied to the beginning of the buffer area.  Next, the next buffer of
information is read in, starting at the end of the current lexeme.  A
null character is appended to the end of the buffer to act as a
sentinel.  Then the scan of the current lexeme is again started, at
the beginning of the buffer area.

      The null character is used as a sentinel so that the end of
buffer condition can be detected.  This allows the end of the buffer
to flo...