Browse Prior Art Database

A Method To Improve The Reliability Of Long Running Operations Disclosure Number: IPCOM000015549D
Original Publication Date: 2002-Jun-22
Included in the Prior Art Database: 2003-Jun-20

Publishing Venue



Disclosed is a method to improve the reliability of long running operations on a storage controller. This design uses a Long Operations Data (LOD) manager to provide a flexible and powerful method to checkpoint critical operational data on the attached storage device. The LOD manager provides a set of basic interfaces that allow a long running task to reserve, write, read, or clear critical checkpoint data. The LOD manager handles the actual storage or retrieval of this data on the storage device. The format of the data is controlled by the task. Each type of task can have its own unique format. Some types of tasks may require large amounts of checkpoint data, while other types of tasks may only need a small amount of checkpoint data. This flexibility of design allows new task types to be added in the future. Each new task type needs to simply use the defined LOD interfaces. The LOD manager allows each long running task to control the frequency of checkpoint updates. This design gives the customer more control over how the storage controller operates. The customer can reduce the frequency of checkpoint updates, in order to reduce the performance impact of the updates on other operations, or the customer may choose to have more frequent updates in order to minimize the impact of a power cycle or a controller failure. Since the LOD manager supports multiple concurrent long running tasks, each task can have its own assigned checkpoint update rate. The LOD manager also provides improved reliability over other storage controllers that use an onboard design for checkpoint updates. With an onboard design, critical data is stored in nonvolatile memory on the storage controller. An onboard design allows long running operations to be resumed after power cycling the storage controller. However, onboard designs do not provide any protection from an actual storage controller failure. If a storage controller with an onboard design fails while a long running task is still in progress, that operation will have to be restarted from the beginning. Since long running operations can take hours or even days to