Browse Prior Art Database

Technique for Defect Management in Computer Main Memory

IP.com Disclosure Number: IPCOM000108591D
Original Publication Date: 1992-Jun-01
Included in the Prior Art Database: 2005-Mar-22
Document File: 2 page(s) / 92K

Publishing Venue

IBM

Related People

Mosley, JM: AUTHOR [+2]

Abstract

This article describes a method for a computer operating system (OS) to automatically deallocate pages containing grown defects from its allocated storage, thereby providing transparent handling of most errors.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Technique for Defect Management in Computer Main Memory

       This article describes a method for a computer operating
system (OS) to automatically deallocate pages containing grown
defects from its allocated storage, thereby providing transparent
handling of most errors.

      Typically, the approach to storage capacity requires that the
storage devices are 100% good at the device level. By way of
illustration, if a single bit is defective in a computer main memory
which contains many millions of bytes of storage, the computer's
power- on self-test (POST) will detect this error and prevent the
computer from continuing its normal operation.  To continue
operating, this defective bit must be replaced.

      The method disclosed herein provides a technique for operating
system management of main storage capacity that offers improved user
support and increased reliability, availability, serviceability (RAS)
characteristics.

      The management of manufacturing defects is a time-honored
method of improving yield and controlling production costs.
Semiconductors, for example, are produced in large batches in the
expectation that some will turn out good.  This approach has been
refined over the years such that redundancy and sparing are now
typically implemented on the device level, i.e., there are extra
portions of a function per chip, which may be used to replace similar
but defective areas.  The handling of grown defects is a different
situation.

      The POST is designed to detect defects and prevent them from
becoming a threat to data.  The typical means of prevention is to
prevent the user from operating the system. Some errors, such as a
key depressed on the keyboard, may be ignored and operation
continues.  Serious errors, however, such as a defect in main memory,
cannot be bypassed.

      Consequently, a defect, even as limited as a single bit in main
memory, renders the system unusable.  This is for the user's
protection, and at first glance is a good approach.

      Error Correcting Code (ECC) memory is being employed with
greater frequency in computer systems.  This has the effect of
masking small memory defects...