Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method for transitioning workload on servers with predicted failures

IP.com Disclosure Number: IPCOM000032072D
Original Publication Date: 2004-Oct-22
Included in the Prior Art Database: 2004-Oct-22
Document File: 2 page(s) / 35K

Publishing Venue

IBM

Abstract

This article describes a method by which a predictive failure alert (PFA) can initiate a workflow in provisioning software which will automatically deploy another server with the same function, add the new server to the load balancer, quiesce the failing server, then remove the failing server without degrading the performance or capacity of the application using the server.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 53% of the total text.

Page 1 of 2

Method for transitioning workload on servers with predicted failures

     Disclosed is a method by which a predictive failure alert (PFA) can initiate a workflow in provisioning software which will automatically deploy another server with the same function, add the new server to the load balancer, quiesce the failing server, then remove the failing server without degrading the performance or capacity of the application using the server. This can be accomplished without dedicating stand-by server resources needed in traditional clustering methods.

     Typical provisioning systems base provisioning decisions on available capacity and overall performance of an application or application tier. Load balancers and provisioning systems can also respond to server failures by redirecting workload and/or provisioning an additional server to replace a server after a failure occurs. These technologies allow servers to be apportioned such that they are utilized optimally, and enable rapid recovery in the event of failures. However, current provisioning strategies still suffer from degraded operation in the event of a server failure until a new server can be provisioned and brought online (typically a 30-60 minute operation).

     Most server systems include specialized hardware and firmware which can predict a failure of a component well before an actual failure occurs. Management software tools have also been developed which can track operating system (OS) resource exhaustion, thus predicting OS crashes with the intent of performing orderly, scheduled software rejuvenation. The methodology described in this article leverages these capabilities for predicting failures, and triggers on the prediction event to execute a workflow which will provision a replacement server before taking the ailing server offline.

     The method can be implemented using a hardware management server to receive hardware PFAs and/or to monitor OS resource exhaustion. In the...