Browse Prior Art Database

Multi-Threaded Buffer Management Method for Lexical Analyzers

IP.com Disclosure Number: IPCOM000113954D
Original Publication Date: 1994-Oct-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 122K

Publishing Venue

IBM

Related People

Feriozi, DT: AUTHOR

Abstract

Disclosed is the use, in a parsing application within a multi-tasking environment, such as OS/2*, of overlapping I/O, so that a first thread processes data in a first buffer while a second thread is filling the next buffer. A specific method is described for handling the situation occurring when a lexeme crosses the end of a buffer.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 51% of the total text.

Multi-Threaded Buffer Management Method for Lexical Analyzers

      Disclosed is the use, in a parsing application within a
multi-tasking environment, such as OS/2*, of overlapping I/O, so that
a first thread processes data in a first buffer while a second thread
is filling the next buffer.  A specific method is described for
handling
the situation occurring when a lexeme crosses the end of a buffer.

      Fig. 1 is a block diagram of a typical parsing application,
such as a compiler or natural language translator, which is dependent
on a lexical analysis of a raw input stream of characters to produce
a stream of tokens to be used by the parser 10.  Since the lexical
analyzer 12 deals with the input at its lowest level, this analyzer
12 must be as efficient as possible.  A very efficient lexical
analyzer will noticeably improve the overall throughput of the
parsing application.  An inefficient lexical analyzer forms a
performance bottleneck that cannot be overcome, even as the best
parsing algorithms available are used.  It is for this reason that
much of the performance oriented research related to parsing
applications centers on the lexical analysis phase of the parser.
For best performance, low level I/O buffer management is required by
the lexical analyzer.

      Fig. 2 is a block diagram showing the most common method used
to achieve I/O buffer management.  A pair of contiguous buffers 14
and 16 are used, along with start and end lexeme pointers 18 and 20.
These two buffers are filled alternately, one after the other as the
end of the previous buffer is reached.  The end-of-lexeme pointer may
wrap around from the second buffer to the first one, causing the
lexeme to consist of two disjointed parts that are difficult to
manipulate in comparison or copy operations.  Providing easy
manipulation is important because about half of the lexemes
encountered by a compiler are identifiers that must be compared with
a symbol table entry, or possibly that must be saved in the symbol
table.

      In a multi-tasking environment like OS/2, overlapping I/0 can
be used to improve throughput significantly.  While the first thread
is processing the data in the first buffer, a second thread can be
filling the next buffer, so that data in the next buffer will be
ready to be processed as soon as processing of the first buffer is
completed.  Alternating between buffers can be handled in a number of
ways, with one way being the use of pointers that can be updated to
point to the physical buffer in current use by each of the threads.

      If alternating buffers are used, there must be some way to
handle the case where a lexeme crosses the end of the buffer.  The
approach presently disclosed is to copy the incomplete lexeme string
to the prefix area of the next buffer, and then to read in the next
buffer, starting at the end of that lexeme string.  This method
requires a prefix area at the beginning of the buffers that is lar...