Browse Prior Art Database

Determining Send/Receive for Large Number of CSRs

IP.com Disclosure Number: IPCOM000105785D
Original Publication Date: 1993-Sep-01
Included in the Prior Art Database: 2005-Mar-20
Document File: 4 page(s) / 189K

Publishing Venue

IBM

Related People

Ekanadham, K: AUTHOR [+2]

Abstract

Given a sequential program the task of creating a set of CSRs that execute the program correctly requires that memory accesses among shared data be coordinated using SEND/WAIT&RECEIVES. If the number of CSRs is large, a means of sampling the SEND/RECEIVE requirements for a set of samples can be used to determine the overall requirements. Given a decomposition of a sequential program into n-segments, it is possible to analyze the performance of these n-segments as CSRs on an n-way multiprocessor which has a hardware monitor that uses memory accesses to identify data sharing that would require SEND/RECEIVE support in a Distributed Memory System (DMS).

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 25% of the total text.

Determining Send/Receive for Large Number of CSRs

      Given a sequential program the task of creating a set of CSRs
that execute the program correctly requires that memory accesses
among shared data be coordinated using SEND/WAIT&RECEIVES.  If the
number of CSRs is large, a means of sampling the SEND/RECEIVE
requirements for a set of samples can be used to determine the
overall requirements.  Given a decomposition of a sequential program
into n-segments, it is possible to analyze the performance of these
n-segments as CSRs on an n-way multiprocessor which has a hardware
monitor that uses memory accesses to identify data sharing that would
require SEND/RECEIVE support in a Distributed Memory System (DMS).
If we subdivided these n-segments into half and now had 2n-segments,
little could be said quantitatively about the extrapolation of the
old SEND/RECEIVE traffic to the new SEND/RECEIVE traffic other than
the SEND/RECEIVE traffic would increase.  In Scalable Massively
Parallel systems the number of CSRs is much larger than the number of
processors available in a tightly coupled shared memory system, so a
new technique is required which can be used for extrapolation.  By
partitioning the output from the program, and using that partition to
specify ownership, a means to create DATA PARALLELIZATION will define
CSRs.  A small group of such CSRs, with individual ownership, run on
separate processors against the remainder of the ownership as a
lumped CSR, running on one processor, in a
tightly-coupled-shared-memory-monitored system.  The results can be
tested for homogeneity and then extrapolated to a much larger number
of CSRs than processors in the tightly coupled system.

      There are two distinct types of parallelism which can be
categorized as Coarse Grained (CG) parallelism and Fine Grained (FG)
parallelism.  Fine-grained parallelism operates on the instruction
level and partitions a putative instruction stream that has a single
logical register file and a single memory hierarchy among several
processor elements.  As such, fine-grained parallelism allows
successive instructions to be executed in parallel and requires that
the result of such executions conform to a RUBRIC OF SEQUENTIAL
CORRECTNESS.  Another implication of this is that the memory
hierarchy that supports fine-grained parallelism is common to all
processor elements that share the same putative instruction stream.

      The basic computational entity within coarse-grained
parallelism is a THREAD which is given a name.  Each THREAD is said
to comprise a sequence of steps (beads) which are one of the
following types:

1.  Compute Step (Using Local Memory/Registers)
2.  Conditional Fork and Thread(Name) Creation
3.  Send Buffer to Name
4.  Wait & Receive Buffer

These threads are called CSR because of the compute-send-receive
aspect of their structure.  The definition of the COMPUTE-STEP
involves a long sequence of instructions that operate within the
c...