Browse Prior Art Database

MEASUREMENT OF FAULT LATENCY: METHODOLOGY AND EXPERIMENTAL RESULTS

IP.com Disclosure Number: IPCOM000128488D
Original Publication Date: 1984-Nov-01
Included in the Prior Art Database: 2005-Sep-16
Document File: 9 page(s) / 104K

Publishing Venue

Software Patent Institute

Related People

Y.H. Lee: AUTHOR [+4]

Abstract

The time interval between the occurrence of fault and the detection of error is composed of two parts: fault latency and error latency. Fault latency is related to the physical property of a fault, whereas error latency represents the efficiency of function-level detection mechanisms. Since the moment of error generation is not directly observable, it is extremely difficult to separate these two latencies experimentally. Instead, most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, we (i) present a new methodology for indirectly measuring fault latency, and (ii) derive the distribution of fault latency from the methodology.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 11% of the total text.

Page 1 of 9

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

MEASUREMENT OF FAULT LATENCY: METHODOLOGY AND EXPERIMENTAL RESULTS

K.G. Shin and Y.-H. Lee CRL-TR-45-84

November 1984 Room 1079, East Engineering Building Ann Arbor, Michigan 48109 USA Tel: (313) 763

MEASUREMENT OF FAULT LATENCY: METHODOLOGY AND EXPERIMENTAL RESULTS [ title ] 1

Kang G. Shin and Yann-Hang Lee

Division of Computer Science and Engineering Department of Electrical Engineering and Computer Science The University of Michigan Ann Arbor, M1 48109.

ABSTRACT

The time interval between the occurrence of fault and the detection of error is composed of two parts: fault latency and error latency. Fault latency is related to the physical property of a fault, whereas error latency represents the efficiency of function-level detection mechanisms. Since the moment of error generation is not directly observable, it is extremely difficult to separate these two latencies experimentally. Instead, most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, we (i) present a new methodology for indirectly measuring fault latency, and (ii) derive the distribution of fault latency from the methodology.

The proposed methodology has been applied successfully to the measurement of fault latency for the Fault Tolerant Multiprocessor(FTMP) at the NASA Airlab. The experimental results show wide variations in the mean fault latencies of different function circuits within the FTMP. More importantly, the measured distributions of fault latency are shown to have monotone hazard

1 This work was supported in part by NASA under both Grant No. NAG 1-296 and Grant No. NAG 1-492. Any opinions, findings, and conclusions or recommendations expressed in this report are those of the authors and do not necessarily reflect the views of NASA.

University of Michigan Computing Research Laboratory Page 1 Nov 01, 1984

Page 2 of 9

MEASUREMENT OF FAULT LATENCY: METHODOLOGY AND EXPERIMENTAL RESULTS

rate. Consequently, Gamma and Weibull distributions are selected for the least-squares fit as the distribution of fault latency. Based on the experience from these experiments, we have also made several remarks.

Subject Index: Measurement within "Evaluation of Reliability and Performance"

[ Chapter ] 1. INTRODUCTION

A hardware fault is defined as an incorrect state caused by the physical change in a component, whereas an error is defined to be the erroneous information/data resulting from the manifestation of a fault. Even after a hardware fault occurs in a computer system, the system will remain error-free until the fault manifests itself. Before its manifestation, the fault is latent and is not harmful to any system operations. Thus, there are two time intervals of interest between fault occurrence and error detection: fault latency and error latency (see 2 for a detailed description of these). Obviously, error latency depends on the det...