Browse Prior Art Database

Automatic mapping of worldwide names to physical locations using fault indicators

IP.com Disclosure Number: IPCOM000019896D
Original Publication Date: 2003-Oct-08
Included in the Prior Art Database: 2003-Oct-08
Document File: 3 page(s) / 51K

Publishing Venue

IBM

Abstract

In a computer system with devices that are each identified by a unique WorldWide Name (WWN), it is often useful to be able to identify those devices by a secondary mechanism. This is important in fault-tolerant systems, in circumstances where the primary data path may have failed. In this article we describe how this can be achieved by pulsing the LED indicator of a disk-drive, or similar device, to determine its physical location. No human intervention is required because the state of LED is read back electronically.

This text was extracted from a PDF file.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

  Automatic mapping of worldwide names to physical locations using fault indicators

The invention is intended for use in computer subsystems which contain a number of devices, each of which is identified by a WorldWide Name (WWN). A typical subsystem might contain 50 disk-drives which are connected to a Host-Bus Adapter (HBA) via a serial communications loop using the Fibre-Channel Arbitrated Loop (FCAL) protocol. Messages sent between the HBA and the disk-drives travel around a loop. When the loop is first established, each device on the loop is assigned an Arbitrated Loop Physical Address (ALPA). These ALPAs remain constant until something in the loop topology changes, for example when a disk-drive is removed.

    Under normal operating conditions, FCAL is a reliable high-speed link. However there are circumstances when a fault condition can cause serious performance problems, or even complete failure of a loop. There are a number of fault scenarios that can cause the HBA to reset the FCAL loop by using the Loop Initialisation Primitive (LIP) sequence. In practice, some faulty disk-drives and cables have given intermittent errors which cause the HBA to run frequent LIP sequences. The LIP sequence can take one or more seconds to complete, so if it occurs too often the using system will see a significant performance impact. In the worst scenarios, the LIP sequences can use up 100% of the FCAL bandwidth causing the loop to hang.

    The enclosure which houses the disk-drives usually has a mechanism for "fencing out" defective disk-drives. The enclosure contains a microprocessor which can respond to commands sent via the SCSI Enclosure Services (SES) protocol. If the HBA decides to fence out a suspect disk-drive, it can send a command to the SES microprocessor, telling it to fence out a specified physical slot number within that enclosure.

    The difficulty with this scheme arises when there are many enclosures connected together in one loop. Large configurations are quite common, with a typical example being five enclosures connected in two loops to two HBAs. One of the two HBAs will be elected "master" and will have the control over the ALPAs. The master HBA is the one that decides to fence out a suspect disk-drive. If the fault is intermittent, the master HBA will be running frequent LIP sequences. After each LIP sequence, the list of ALPAs may be slightly different. In practice it has been seen that the HBA could incorrectly calculate the enclosure number or slot number of a suspect disk-drive. This means that when it sends the command to the SES microprocessor, the result is that the wrong disk-drive is fenced out. So now we have two disk-drives which cannot be accessed. This is disastrous in a RAID environment because now the data cannot be accessed.

    If the HBA had an accurate table which mapped the identity of each disk-drive to an identifiable enclosure number and slot number then the fencing out process would be much more reliable. Our...