Browse Prior Art Database

Automated Abstraction of Source Code for Structured Analysis

IP.com Disclosure Number: IPCOM000114321D
Original Publication Date: 1994-Dec-01
Included in the Prior Art Database: 2005-Mar-28
Document File: 4 page(s) / 127K

Publishing Venue

IBM

Related People

OHare, AB: AUTHOR [+2]

Abstract

An automated, reverse engineering system is disclosed that provides a high level of integration with a Computer Aided Software Engineering (CASE) tool. Specifically, legacy code is transformed into abstractions within a Structured Analysis methodology. The abstractions are based on data-flow diagrams, state transition diagrams, and entity-relationship data models. Since the resulting abstractions can be browsed and modified within a CASE tool environment, a broad range of software engineering activities are supported including program understanding, reengineering, and redocumentation. In addition, diagram complexity is reduced through the application of control partitioning; an algorithmic technique for managing complexity by partitioning source code modules into smaller yet semantically coherent units.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 47% of the total text.

Automated Abstraction of Source Code for Structured Analysis

      An automated, reverse engineering system is disclosed that
provides a high level of integration with a Computer Aided Software
Engineering (CASE) tool.  Specifically, legacy code is transformed
into abstractions within a Structured Analysis methodology.  The
abstractions are based on data-flow diagrams, state transition
diagrams, and entity-relationship data models.  Since the resulting
abstractions can be browsed and modified within a CASE tool
environment, a broad range of software engineering activities are
supported including program understanding, reengineering, and
redocumentation.  In addition, diagram complexity is reduced through
the application of control partitioning; an algorithmic technique for
managing complexity by partitioning source code modules into smaller
yet semantically coherent units.  This approach also preserves the
information content of the original source code.

      The Figure shows the basic organization of the reverse
engineering system.  The inputs include the source code to be
analyzed and data on existing objects extracted from the current
design database in the CASE tool.  The source code to be analyzed
must be syntactically correct and the tool must have access to any
libraries of include files or macro definitions that are referenced.
As a rule, the source code should be compilable though it need not be
linkable.  The output of the reverse engineering system is a data set
that, after being input into the CASE tool, constitutes a
comprehensive model of the source code that was analyzed.  This model
is cast in terms of the abstractions and concepts common to most
development methodologies based on Structured Analysis for real-time
systems  (SA/RT).

      The first step in the reverse engineering process involves
accessing the repository of the CASE tool to obtain information about
previously defined objects.  The primary purpose of this information
is to avoid name conflicts with objects that already exist in the
CASE tool.  It is also necessary in order to support incremental
reverse engineering.  Incremental reverse engineering is the ability
to process different modules of a software system at different times
(as opposed to all at the same time).  This also allows the user to
reverse engineer subsets of the source code.

      The next few stages of processing are analogous to those used
in conventional compilers, i.e., preprocessing, syntactic, and
semantic analysis of the source code.  At this point the system
produces equivalent intermediate code in a canonical form which is
then analyzed for control partitions.  Control partitions are
semantically coherent collections of the source code based on an
analysis of control flow (including procedure calls).  This is a
deterministic process that successively abstracts sequences of source
code into new partitions until a specific level of control flow
comple...