Browse Prior Art Database

Mitigating Human Errors Which Otherwise Can Lead To "Storage Data Loss Of Access (LOA)" Incidents

IP.com Disclosure Number: IPCOM000224108D
Publication Date: 2012-Dec-07
Document File: 2 page(s) / 21K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method to address cases of technician human error to avoid incidents of loss of access (LOA) during a battery replacement procedure. The approach is to be prepared for such a human error and, before a battery exchange begins, "switch" the system into a new mode of operation where the cache is flushed to the Solid State Drive (SSD). The SDD is then used as an alternative cache throughout the duration of the battery exchange procedure.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 2

Mitigating Human Errors Which Otherwise Can Lead To "Storage Data Loss Of Access (LOA)" Incidents

Storage devices based on (rather slow) spinning disks use the fast cache to speed up input/output (I/O) requests. Being a volatile memory, the data in the cache is at risk if the power supply is interrupted. In order to mitigate the risk of cache data loss, many of the storage devices are equipped with an Uninterruptable Power Supply (UPS). The basic design is to recognize a power loss incident and have enough power in the UPS' batteries to flush all the cache to disks and then to power down the storage. This sort of solution gives priority to avoiding data loss over loss of access to the data. The UPS use batteries have limited life-expectancy and need to be replaced with new ones every so often. Using a redundant system allows for enables the replacement of one UPS battery at a time in a concurrent fashion; while the [storage] system keeps running in full production as the other [two] UPS are designed to sustain the whole system load on their own.

Occasionally however, the technician who changes the batteries makes a mistake and replaces the wrong battery (i.e., replaces a good battery instead of the bad one) and in doing so leaves the system with less power than it needs to operate. The result in that case is a graceful shutdown where all data is flushed to disks and a Loss of Access (LOA) incident. A design is needed to mitigate such incidents.

The Solid State Drive (SSD) is not a viable writeable media. The restriction on the number of writes forced developers to throttle the writes so that the SSD can be used for about three years. This limitation is required for the normal activity. Currently, when administrators suspect that the batteries are not charged or there is an indication that the power supply is lost, the system immediately goes into a graceful shutdown, as is the case when a good battery is replaced instead of a bad one.

The solution is a new system and method to address cases of technician human error to avoid incidents of loss of access even when such errors are committed. The approach is to be prepared for such a human error and, before a battery exchange begins, "switch" the system into a new mode of operation where the cache is flushed to the SSD and the SDD is used as an alternative cache throughout the duration of the battery maintenance procedure. If the wrong battery is replaced, then no action is taken as long as the wall-power is still available. If all power sources are down, then having the SSD as a cache protects the system from any data loss. Once the mista...