Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Arrangement to Speed-Up Parallel-Processor Operation

IP.com Disclosure Number: IPCOM000038642D
Original Publication Date: 1987-Feb-01
Included in the Prior Art Database: 2005-Jan-31
Document File: 4 page(s) / 81K

Publishing Venue

IBM

Related People

So, K: AUTHOR [+2]

Abstract

A concept is described herein which will allow the execution of parallel programs containing serial sections to be notably speeded up, so as to improve the performance of Parallel Processing Systems (PPSs) well beyond that obtainable by conventional parallel-processing techniques. The concept is realized by a hardware arrangement which is provided as a basis to facilitate compilers to automatically generate object-codes that will allow the Processing Elements (PEs) in a PPS to operate in the designated way. The principal concept is the provision of a hardware arrangement to enable the PEs which are idling during the execution of a serial section to start performing work as early as possible for the coming parallel section, thus utilizing the would-be-idle time and shortening the overall processing time.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 37% of the total text.

Page 1 of 4

Arrangement to Speed-Up Parallel-Processor Operation

A concept is described herein which will allow the execution of parallel programs containing serial sections to be notably speeded up, so as to improve the performance of Parallel Processing Systems (PPSs) well beyond that obtainable by conventional parallel-processing techniques. The concept is realized by a hardware arrangement which is provided as a basis to facilitate compilers to automatically generate object-codes that will allow the Processing Elements (PEs) in a PPS to operate in the designated way. The principal concept is the provision of a hardware arrangement to enable the PEs which are idling during the execution of a serial section to start performing work as early as possible for the coming parallel section, thus utilizing the would-be-idle time and shortening the overall processing time. The hardware and software required for implementing the proposed concept will be described below. Hardware capabilities (in the form of new instructions) that are needed to be introduced in the PEs include: (1) "Parallel-Operation Alert" (PA) (2) "Parallel-Operation Begin" (PB) (3) "Instructions Update and Execute" (IUE) (4) "Dependency Release" (DR) The PA instruction, which is to be automatically inserted by a compiler, is used to mark the beginning of a serial initiating-phase preceding a parallel section. The instruction will contain, in its operand fields, parameters specifying information- items about the coming parallel program-section (such as its start-address and length, expected degrees of parallelism and loop-nesting, etc.). The PE executing the serial phase (say, PEO) will, on encountering such a PA instruction, "broadcast a global alert" to the other PEs, informing them of the imminent arrival of, and the relevant information about, the coming parallel section. On receiving the alert, all idling PEs can use the PB instruction to start taking part in the execution of the parallel section. Each PE will get a loop-index
(I) for the parallel section, then using the received broadcast information it fetches from the shared memory to its local store the code of the parallel program-section. (The program- fetching can be done in an earlier time for all participating PEs to avoid unnecessary contentions in accessing the shared memory.) After fetching the program-section, each of the participating PEs will scan through the section repeatedly, and proceed to "update" the operands in the instructions therein (by replacing all parameterized or indexed data-references in them first with absolute-addressed ones and then with the fetched data-items themselves), and subsequently to execute the sequence of instructions "as far as possible". This updating process will in effect resolve all instructions with indirect addressing, so that they become instructions involving only data-items in the PE's own local store. The execution of these "fast" instructions will take less time, since...