Browse Prior Art Database

Recovery Mechanisms for Fetch&op Instruction Execution Errors

IP.com Disclosure Number: IPCOM000038891D
Original Publication Date: 1987-Mar-01
Included in the Prior Art Database: 2005-Feb-01
Document File: 6 page(s) / 37K

Publishing Venue

IBM

Related People

Malek, M: AUTHOR [+2]

Abstract

The Fetch&Add (x, A) instruction was introduced by the NYU Ultracomputer [3]. The RP3 project [1] generalized this instruction to Fetch&Operation (x, A), called Fetch&OP herein. This Fetch&OP instruction first reads the contents of location "A" of a memory. It then applies the necessary operation "OP" on this data and the instruction's operand "x". It then stores (if needed) the intermediate or final result of this operation and returns the result. This instruction executes in an indivisible manner. This instruction is executed at the memory and is supported by the combining network [2] of the RP3 computer. The main advantages gained by shared memory parallel computers (e.g., RP3) by supporting this instruction are: 1.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 22% of the total text.

Page 1 of 6

Recovery Mechanisms for Fetch&op Instruction Execution Errors

The Fetch&Add (x, A) instruction was introduced by the NYU Ultracomputer [3]. The RP3 project [1] generalized this instruction to Fetch&Operation (x, A), called Fetch&OP herein. This Fetch&OP instruction first reads the contents of location "A" of a memory. It then applies the necessary operation "OP" on this data and the instruction's operand "x". It then stores (if needed) the intermediate or final result of this operation and returns the result. This instruction executes in an indivisible manner. This instruction is executed at the memory and is supported by the combining network [2] of the RP3 computer. The main advantages gained by shared memory parallel computers (e.g., RP3) by supporting this instruction are: 1. When executed only at the memory, it provides a more powerful indivisible operation instruction than the

Test&Set instruction used traditionally. 2. When used in the combining network [2] and the memory, it can reduce the execution time of operations on

shared variables. That is, if the operation would be

executed in order O(N) time, then this instruction can

execute this operation in order O(logN) time [4]. Due to these advantages this instruction can be used effectively to improve the execution time of parallel algorithms for shared memory parallel computers. In these algorithms it can be used to access the shared variables. Because of these advantages the RP3 computer is supporting this instruction. This instruction is expected to be used by both the user applications and the operating system. The operating system can use it to access the data structures and variables of its kernel that are shared by the parallel processors of the system. Therefore, we see that it is an important instruction for such computers. If an error takes place while executing this operation, it may not be possible to recover from it if simple instruction retry is done.

This is because the memory location may have been modified. Two recovery mechanisms to recover from Fetch&OP execution errors are described herein. If such recovery mechanisms are not used to recover from Fetch&OP errors, then the system will have to use check pointing or it will have to re-IPL the system. This degrades the system performance. Work in the parallel processor area has shown that it is not always possible to use instruction retry to recover from errors in pipelined parallel computers. By this we mean that processors cannot always reissue an instruction if an error is detected during its execution. This has been observed for instructions sent across the network. This is because instruction retry can sometimes destroy the sequential consistency of a program. But sometimes it is possible to use instruction retry in such systems. For example this is possible in some systems when the processors "fence" every instruction across the network. That is they do not pipeline the instructions to the memory. Inst...