Browse Prior Art Database

Memory and Communication Chip Server with Incorporated Fault Tolerance

IP.com Disclosure Number: IPCOM000104944D
Original Publication Date: 1993-Jun-01
Included in the Prior Art Database: 2005-Mar-19
Document File: 4 page(s) / 98K

Publishing Venue

IBM

Related People

Gravano, L: AUTHOR [+2]

Abstract

In this disclosure, an apparatus that provides some degree of fault tolerance to a chip hosting memory and communication facilities for massively parallel machines is presented.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Memory and Communication Chip Server with Incorporated Fault Tolerance

      In this disclosure, an apparatus that provides some degree of
fault tolerance to a chip hosting memory and communication facilities
for massively parallel machines is presented.

      The complementary nature of the requirements of RAM and
communication in massively parallel machines has been exploited by
integrating communication facilities and RAM on a single chip.  This
chip is called Memory and Communication Server (MACS).

      Disclosed is a scheme that allows the MACS chips to operate in
case faults occur.  Since the MACS chip hosts a large amount of
memory and communication logic, it is desirable to have some fault
tolerance incorporated.  Some new architecture features are shown to
provide each MACS chip with the ability to tolerate one faulty RAM
module or one communication node failure.

      The model for faults is static, i.e., after the detection of a
fault, a machine hosting a large number of PEs can be reconfigured.
Furthermore, this reconfiguration affects those MACS chips where
faults occurred and eventually some neighboring MACS chips only.  In
addition, the different MACS chips are allowed to fail independently.

      Each MACS chip will consist of 17 communication nodes and
memory modules.  This is due to the fact that up to one faulty
communication node or memory module will be tolerated per MACS chip.

      Each node of the MACS chip is assigned a number.  The n-th node
is coupled to the (n+1)th, (n-1)th, (n+2)th, (n-2)th, (n+3)th, and
(n-3)th nodes.  The last three nodes (15, 16, and 17) are special
cases, since the above coupling scheme requires tha...