Browse Prior Art Database

System Monitor Program

IP.com Disclosure Number: IPCOM000115095D
Original Publication Date: 1995-Mar-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 4 page(s) / 97K

Publishing Venue

IBM

Related People

Semple, BP: AUTHOR

Abstract

A program/algorithm is disclosed which monitors the operation of other mission critical programs on a server or host machine and restarts said programs if needed. The solution provides an inexpensive "fault tolerant" layer which automatically restarts mission critical applications that have been terminated and removed from memory by the operating system due to unanticipated conditions. A basic software fault tolerance is essential for standalone, unattended, and off shift operations of critical server applications supporting numerous clients.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

System Monitor Program

      A program/algorithm is disclosed which monitors the operation
of other mission critical programs on a server or host machine and
restarts said programs if needed.  The solution provides an
inexpensive "fault tolerant" layer which automatically restarts
mission critical applications that have been terminated and removed
from memory by the operating system due to unanticipated conditions.
A basic software fault tolerance is essential for standalone,
unattended, and off shift operations of critical server applications
supporting numerous clients.
  *
  * This file contains the parameters and default values necessary
for
  * the System Monitor Program
  *
  Interval 30
  Program DB PIPE SERVER 1           strtdbs1
  Program DB PIPE SERVER 2           strtdbs2
  Program DB PIPE SERVER 3           strtdbs3
  Program Telephone Interface Master starttim
  Program System Purge Manager       startspm
  Program Remote Site Support        startrss
  Program Host Status Interface      starthsi
  Program Report Uploader            startbul
  Fig. 1.   Parameter File

      The operating environment requires two key elements for
operation of a system monitor program.  The first element is a way
for the monitor program to determine what programs/processes are
currently running on the monitored machine.  For the OS/2*
environment, this involves programatically reading the task list.
Second, a way to restart each of the applications that are being
monitored.  Again, for the OS/2 environment, a collection of REXX
command files is provided.  There is no dependency on the operating
system environment as long as these two key requirements are met.

      The selection of programs to be monitored as well as their
restart sequence is externally configured using a ASCII flat file.
In addition, various timeouts and thresholds can be configured using
this file.  The name of this file is the sole parameter passed to the
system monitor program.  Fig. 1 shows a basic parameter file.

      Notice that there are two keywords currently defined, an
Interval and a Program.  The Interval keyword sets the cycle period
(in seconds) for the monitor.  The Program keyword establishes a
program to be monitored as well as its associated restart sequence.
  Read and process parameter file (builds program status table)
  Do Forever
    ask the operating system what programs are running
    For (each item in the program status table)
      If (the program is running) Then
        If (ok_checks > OK_TO_RESET threshold) Then
          reset the number of restarts counter
        else
          increment the ok_checks counter
      Else // program is not running
        increment the number of restarts counter
        ok_checks = 0
       ...