Browse Prior Art Database

A Semantic Scanner for Understanding Programs with Comments

IP.com Disclosure Number: IPCOM000106142D
Original Publication Date: 1993-Oct-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 2 page(s) / 77K

Publishing Venue

IBM

Related People

D. Gangopadhyay and W. Zadrozny: AUTHOR

Abstract

Disclosed is a method automated identification of functions of fragments of software. The method uses a combination of program analysis, cliche recognition and natural language techniques. Given a program with comments, the disclosed algorithm returns as its output a semantic description of the software function; i.e., in terms of the program purpose or meaning, by using vocabulary that is used to describe the particular application of the program. This analysis gives an understanding of the semantics of the program, much beyond that obtained only by the syntactic analysis of the program.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

A Semantic Scanner for Understanding Programs with Comments

      Disclosed is a method automated identification of functions of
fragments of software.  The method uses a combination of program
analysis, cliche recognition and natural language techniques.  Given
a program with comments, the disclosed algorithm returns as its
output a semantic description of the software function; i.e., in
terms of the program purpose or meaning, by using vocabulary that is
used to describe the particular application of the program.  This
analysis gives an understanding of the semantics of the program, much
beyond that obtained only by the syntactic analysis of the program.

      The invention uses a method called semantic abstraction to
derive a semantic meaning from a fragment of code.  The steps of
semantic abstraction are described below.

        First the program syntax is analyzed using standard parsing
techniques, used in programming language compilers, resulting in a
program dependence graph (i.e., a representation of control and
data-flow dependencies among program identifiers).  By doing this,
the program syntax is broken down into groupings that include program
identifiers (such as program variables,...) and program connectives
(such as "if:", "then", "do", ";", etc.).  This step also determines
the data flow and control flow dependencies.  However, at this point,
there is no semantic understanding of the specific application of the
program.

      The next step is to perform semantic abstraction of each
syntactic grouping.  This is done by correlating a grouping to an
action term and a set of object terms.  These action and object terms
are taken from the vocabulary used to describe the program
application.

      The action terms are verb phrases and are determined by the
following cliche recognition ste...