Browse Prior Art Database

Language Processing Language Character Class Support

IP.com Disclosure Number: IPCOM000109575D
Original Publication Date: 1992-Sep-01
Included in the Prior Art Database: 2005-Mar-24
Document File: 2 page(s) / 115K

Publishing Venue

IBM

Related People

Hidalgo, DS: AUTHOR

Abstract

Disclosed is a feature of the LANGUAGE PROCESSING LANGUAGE (LPL) that supports the definition of abstract character sets or classes that are used in the lexical analysis done the lexical scanners mechanically generated from a Language Definition File. The mechanism used by these abstract character sets provides an efficient mechanical implementation of a common state determination technique used in lexical analysis, and combines that technique with the support for user selection of predefined character codepages, thus providing abstracted National Language Support for formal languages implemented through LPL.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Language Processing Language Character Class Support

       Disclosed is a feature of the LANGUAGE PROCESSING
LANGUAGE (LPL) that supports the definition of abstract character
sets or classes that are used in the lexical analysis done the
lexical scanners mechanically generated from a Language Definition
File.  The mechanism used by these abstract character sets provides
an efficient mechanical implementation of a common state
determination technique used in lexical analysis, and combines that
technique with the support for user selection of predefined character
codepages, thus providing abstracted National Language Support for
formal languages implemented through LPL.

      The design of lexical scanners for use in mechanical language
translation or in other types of stream data processing requires that
an input character code be identified as quickly as possible as being
part of one of several possible syntactic units or tokens.  Within
the context of a finite state automaton, which is the approach most
commonly used to implement lexical scanners and is the approach used
by LPL, this action is referred to as "state determination", and a
state determination technique is used to effect that action.  Since
every input code must be put through the same state determination
technique, it becomes imperative from a performance viewpoint to use
an efficient implementation of such technique.

      This invention includes syntactic facilities to define
abstracted character sets that can be used to define lexical rules or
tokens, and to declare the specific character codepages that are to
be supported by the input language.  The syntax diagrams in the
figures describe these facilities.  Also included in this invention
is the method for implementing the state determination technique in
an efficient manner that comb...