Browse Prior Art Database

Method to Increase Accessibility of Low-Level Computer State Information

IP.com Disclosure Number: IPCOM000123554D
Original Publication Date: 1999-Jan-01
Included in the Prior Art Database: 2005-Apr-05
Document File: 3 page(s) / 138K

Publishing Venue

IBM

Related People

Gamble, ES: AUTHOR [+4]

Abstract

Disclosed is a method to increase Accessibility of low-level computer state information. For the purposes of this discussion, the term "Central Processing Complex" refers to the primary system processor and all surrounding processing components. These components are connected logically via a JTAG (Joint Test Action Group) scan chain or other industry-standard scannable bus. The term "Service Processor" refers to a separate processor, resident on the same bus, whose purpose is to monitor the well-being of the Central Processing Complex. The Service Processor may be a lower-function unit dedicated solely to monitoring, or it may be a "sister" processor complex, which has the capability to monitor the health of its sibling even as it performs other primary computational tasks.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 42% of the total text.

Method to Increase Accessibility of Low-Level Computer State Information

   Disclosed is a method to increase Accessibility of
low-level computer state information.  For the purposes of this
discussion, the term "Central Processing Complex" refers to the
primary system processor and all surrounding processing components.
These components are connected logically via a JTAG (Joint Test
Action Group) scan chain or other industry-standard scannable bus.
The term "Service Processor" refers to a separate processor, resident
on the same bus, whose purpose is to monitor the well-being of the
Central Processing Complex.  The Service Processor may be a
lower-function unit dedicated solely to monitoring, or it may be a
"sister" processor complex, which has the capability to monitor the
health of its sibling even as it performs other primary computational
tasks.  The term "low-level state information" refers to the
thousands of physical switch and register values present on the
logical components within the Central Processing Complex.

   In today's increasingly complex computer environment,
customer needs are dictating ever-higher system availability.
Machines are expected to be operational around the clock, and when a
failure does occur, market forces dictate that service personnel
diagnose the problem quickly and accurately to ensure that the
failing component and mechanism are identified.  One of the major
problems in accomplishing this goal is the physical isolation of
machines in the field; when a system crash transpires, the only
information available to service personnel are the standard operating
system error logs and the system dump.  The former gives high-level
information about the failing component (e.g., a "System Planar"
failure), and the latter gives information about the state of memory
and the operating system at the time of the crash.  However, it has
been impossible to get more specific information about the low-level
state of the Central Processing Complex, as this state information is
lost when the machine is shut down and rebooted.

   In the development lab, diagnostic tools exist to analyze
such low-level state information, but they are only of value when
they can be physically taken to the failing machine.  Since
distributing and transporting diagnostic equipment involves both
material and labor expense, and since the equipment is only
effective before power is removed from the failing system, it has
been impractical to use such tools in the field to gather low-level
system state information.  The problem, framed thus, is that
information exists within the system at the time of the crash which
would aid in diagnosing failure mechanisms and improving future
product designs.  However, this detailed information is inaccessible
to designers and service personnel, unless an identical failing
condition can be reproduced in the lab.

   The solution to the above problem, inaccessibility of
low-level hardware information,...