Browse Prior Art Database

SYNCHRONIZATION AND FAULTING IN REDUNDANT REAL-TIME SYSTEMS

IP.com Disclosure Number: IPCOM000128460D
Original Publication Date: 1983-Nov-01
Included in the Prior Art Database: 2005-Sep-16

Publishing Venue

Software Patent Institute

Related People

C.M. Krishna: AUTHOR [+5]

Abstract

A real-time computer may fail because of (i) massive component failures or (ii) not responding quickly enough to satisfy real-time requirements. An increase in redundancy -- a conventional means of improving reliability -- can improve the former but can -- in some cases -- degrade the latter considerably due to the overhead associated with redundancy management, namely the time delay resulting from synchronization and voting/interactive consistency techniques.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 6% of the total text.

Page 1 of 16

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

SYNCHRONIZATION AND FAULTING IN REDUNDANT REAL-TIME SYSTEMS

C. M. Krishna, Kang G. Shin and Ricky W. Butler1

CRL-TR-32-83

NOVEMBER 1983

THE UNIVERSITY OF MICHIGAN COMPUTING RESEARCH LABORATORY2 Room 1079, East Engineering Building
Ann Arbor, Michigan 48109
USA
Tel: (313) 763-8000

ABSTRACT

A real-time computer may fail because of (i) massive component failures or (ii) not responding quickly enough to satisfy real-time requirements. An increase in redundancy -- a conventional means of improving reliability -- can improve the former but can -- in some cases -- degrade the latter considerably due to the overhead associated with redundancy management, namely the time delay resulting from synchronization and voting/interactive consistency techniques.

In this report, we consider the implications of synchronization and voting/interactive consistency algorithms in N-modular clusters on reliability. All these studies have been carried out in the context of real-time applications. As a demonstrative example, we have analyzed results from experiments conducted at the NASA Airlab on the Software Implemented Fault-Tolerance (SIFT) computer. This analysis has indeed indicated that in most real-time applications, it is better to employ hardware synchronization instead of software synchronization, and not allow reconfiguration.

Index Terms: Real-time computers, probability of dynamic failure, hard deadlines, fault-masking, malicious failure, voting, interactive consistency, synchronization.

1. INTRODUCTION

The use of digital computers, particularly multiprocessors, has become commonplace in such real-time applications as aircraft control, nuclear reactor control, power distribution and monitoring, automated manufacturing, etc. Such computers are typically required to have very high reliability. (For example, the benchmark figure used at NASA is 10-9 probability of failure over a 10-hour flight period for an aircraft control computer). Unlike their conventional

1 of NASA Langley Research Center, Hampton, VA 23665

2 The work reported in this report was supported in part by NASA Grant No. 1-296. Any opinions, findings, and conclusions or recommendations expresses in this publication are those of the authors and do not necessarily reflect the view of NASA. All correspondence regarding this report should be addressed to Professor Kang G. Shin.

University of Michigan Computing Research Laboratory Page 1 Nov 01, 1983

Page 2 of 16

SYNCHRONIZATION AND FAULTING IN REDUNDANT REAL-TIME SYSTEMS

counterparts, the failure probability of real-time systems is not completely characterized by the probability of massive hardware failure alone: failure can also occur due to excessively long response times. In other words, there are hard deadline for code execution that, if missed, can lead to catastrophic consequences. The probability of dynamic failure, pdyn, introduced in 3 and further refined in 4 int...