Browse Prior Art Database

DESIGN AND EVALUATION OF A FAULT-TOLERANT MULTIPROCESSOR USING HARDWARE RECOVERY BLOCKS

IP.com Disclosure Number: IPCOM000128445D
Original Publication Date: 1982-Aug-01
Included in the Prior Art Database: 2005-Sep-16

Publishing Venue

Software Patent Institute

Related People

Lee, Yann-Hang: AUTHOR [+4]

Abstract

In this paper we consider the design and the evaluation of a fault- tolerant multiprocessor with a rollback recovery mechanism.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 6% of the total text.

Page 1 of 19

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

DESIGN AND EVALUATION OF A FAULT-TOLERANT MULTIPROCESSOR USING HARDWARE RECOVERY BLOCKS

Yann-Hang Lee and Kang G. Shin

CRL-TR-6-82

AUGUST 1982

THE UNIVERSITY OF MICHIGAN COMPUTING RESEARCH LABORATORY1 Room 1079, East Engineering Building
Ann Arbor, Michigan 48109
USA
Tel: (313) 763-8000

ABSTRACT

In this paper we consider the design and the evaluation of a fault- tolerant multiprocessor with a rollback recovery mechanism.

The rollback mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory module. When a fault is detected, the multiprocessor reconfigures itself to replace the faulty component and then the process originally assigned to the faulty component retreats to one of the previously saved states in order to resume fault-free execution. Due to random interactions among cooperating processes and also due to asynchrony in the state-savings, the rollback of a process may propagate to others and multiple-step rollbacks may thus become necessary. In the worst case, when all the available saved states are exhausted, the processes have to restart from the beginning as if they were executed in a system without any rollback recovery mechanism. A mathematical model is proposed to calculate both the coverage of multi-step rollback recovery and the risk of restart. The performance evaluation in terms of the mean and variance of execution time of a given task is also presented.

Index Terms

Fault-tolerant multiprocessor, rollback recovery, hardware/software recovery block, rollback propagation, coverage of recovery.

INTRODUCTION

There are numerous benefits to be gained from a multiprocessor. In addition to the decreasing of hardware cost and the inherent reliability of LSI components, the capacity of reconfiguration

1 This work was supported in part by NASA grant No. NAG 1-296. All correspondence should be sent to Professor Kang G. Shin. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies

University of Michigan Computing Research Laboratory Page 1 Aug 01, 1982

Page 2 of 19

DESIGN AND EVALUATION OF A FAULT-TOLERANT MULTIPROCESSOR USING HARDWARE RECOVERY BLOCKS

makes the multiprocessor attractive when system reliability is important. It is particularly essential to critical real-time applications that the system be tolerant of failure with minimum time overhead and that the task be completed prior to the imposed deadline. Hence, one of the major issues of reliable multiprocessor design is error recovery without having to restart the whole task when an error occurs .

In general, the tolerance of failure during system operation is realized by three steps: detecti...