Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

OPTIMAL DESIGN AND USE OF RETRY IN FAULT TOLERANT REALTIME COMPUTER SYSTEMS

IP.com Disclosure Number: IPCOM000128483D
Original Publication Date: 1984-May-01
Included in the Prior Art Database: 2005-Sep-16

Publishing Venue

Software Patent Institute

Related People

Lee, Yann-Hang: AUTHOR [+4]

Abstract

In this report, we present a new method for (i) determining an optimal retry policy and (ii) using retry for fault characterization. First, we derive an optimal retry policy for a given fault characteristic, which determines the maximum allowable retry durations so as to minimize the total task completion time. Then, we carry out the combined fault characterization and retry decision, in which the characteristics of fault are estimated simultaneously with the determination of the optimal retry policy. We have developed two solution approaches; one is based on the point estimation and the other on the Bayes sequential decision. The maximum likelihood estimators are used for the first approach, and the backward induction for testing hypotheses in the second approach. We also present numerical examples in which all the durations associated with faults (i.e. active, benign, and inter-failure durations) have monotone hazard rate functions, e.g., exponential, Weibull and gamma distributions. These are standard distributions commonly used for modeling and analyses of faults. Categories and Subject Descriptors: B.2.3 [ Arithmetic and Logic Structures ] : Reliability, Testing and Fault-Tolerance -- hazard rate Junction, recovery overhead, optimal retry policy, fault characteristic; G.3 [ Probability and Statistics ] - estimation, cen~ored sampling, likelihood ratio, sequential or Bayes decision problem, hypotheses testing.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 5% of the total text.

Page 1 of 20

THIS DOCUMENT IS AN APPROXIMATE REPRESENTATION OF THE ORIGINAL.

OPTIMAL DESIGN AND USE OF RETRY IN FAULT TOLERANT REALTIME COMPUTER SYSTEMS

Yann-Hang Lee and Kang G. Shin

THE UNIVERSITY OF MICHIGAN COMPUTING RESEARCH LABORATORY CRL-TR-28-84

May 1984

Room 1079, East Engineering Building

Ann Arbor, Michigan 48109
USA
Tel: (313) 763-8000

OPTIMAL DESIGN AND USE OF RETRY IN FAULT TOLERANT REALTIME COMPUTER SYSTEMS1 Yann-Hang Lee and Kang G. Shin

ABSTRACT

In this report, we present a new method for (i) determining an optimal retry policy and (ii) using retry for fault characterization.

First, we derive an optimal retry policy for a given fault characteristic, which determines the maximum allowable retry durations so as to minimize the total task completion time. Then, we carry out the combined fault characterization and retry decision, in which the characteristics of fault are estimated simultaneously with the determination of the optimal retry policy. We have developed two solution approaches; one is based on the point estimation and the other on the Bayes sequential decision. The maximum likelihood estimators are used for the first approach, and the backward induction for testing hypotheses in the second approach.

We also present numerical examples in which all the durations associated with faults (i.e. active, benign, and inter-failure durations) have monotone hazard rate functions, e.g., exponential, Weibull and gamma distributions. These are standard distributions commonly used for modeling and analyses of faults.

Categories and Subject Descriptors: B.2.3 [ Arithmetic and Logic Structures ] : Reliability, Testing and Fault-Tolerance -- hazard rate Junction, recovery overhead, optimal retry policy, fault characteristic; G.3 [ Probability and Statistics ] - estimation, cen~ored sampling, likelihood ratio, sequential or Bayes decision problem, hypotheses testing.

1 This work was supported in part by NASA under Grant NAG 1-296. Any opinions, findings, and conclusions or recommendations expressed in this report are those of the authors and do not necessarily reflect the views of NASA. Authors' address: Division of Computer Science and Engineering, Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, MI 48109.

All correspondence should be addressed to Prof. Kang G. Shin at the above address.

University of Michigan Computing Research Laboratory Page 1 May 01, 1984

Page 2 of 20

OPTIMAL DESIGN AND USE OF RETRY IN FAULT TOLERANT REALTIME COMPUTER SYSTEMS

1. INTRODUCTION

There are three types of fault in computer systems: transient, intermittent, and permanent.2

Transient faults die within a certain time of their generation, intermittent faults cycle between being active and inactive, and permanent faults are (as the term indicates) permanent. It has been found that permanent faults form but a small fraction of the faults in computer systems. 34

This makes the purging of any faulty components as s...