Browse Prior Art Database

Imprecise Error Reporting in an InfiniBand (IB) Send Queue

IP.com Disclosure Number: IPCOM000011027D
Original Publication Date: 2003-Feb-10
Included in the Prior Art Database: 2003-Feb-10
Document File: 3 page(s) / 40K

Publishing Venue

IBM

Abstract

A simplified alternative is described to the defined behavior of the InfiniBand (IB) send queue for a reliable connected queue pair (QP). The IB architecture requires that requests posted to the send queue for a queue pair complete in the order that they were issued, and that no request complete in error prior to all earlier requests completing successfully. The alternative solution reports the error on the earliest pending request even though this might be a different request than the one which encountered the error if the faulting request was not at the head of the queue at the time of error. This alternative solution is much simpler to implement and reduces development expense and overall product cost.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 34% of the total text.

Page 1 of 3

Imprecise Error Reporting in an InfiniBand (IB) Send Queue

  Disclosed is a simplified alternative to the defined behavior of the InfiniBand (IB) send queue for a reliable connected queue pair (QP). The IB architecture requires that requests posted to the send queue for a queue pair complete in the order that they were issued, and that no request complete in error prior to all earlier requests completing successfully. This is difficult to implement because typically many requests are concurrently in various stages of execution, and a later request might encounter an error while earlier requests are still pending requiring notification of the error to be deferred while the earlier requests complete. The alternative solution reports the error on the earliest pending request even though this might be a different request than the one which encountered the error if the faulting request was not at the head of the queue at the time of error. This alternative solution is much simpler to implement and reduces development expense and overall product cost.

The IB defined behavior is as follows. The IB architecture has sequential semantics,
i.e., it requires that the IB requester report back status of the requests posted to the send queue in the order they were received. This is difficult if the IB requester is working on multiple requests concurrently. For example, the IB requestor could have received three requests (A,B,C) from the software client to perform. At a given point in time it might be prefetching data to build packets for request C, sending packets for request B, and waiting for acknowledge packets for the packets already transmitted for request A. If an error is encountered during the fetch of data for request C, it cannot be immediately reported to the software client because the software client would assume requests A and B had finished successfully. Instead it must wait for requests A and B to complete before reporting the error. These requests can also encounter errors, and thus result in complex nested error recovery scenarios. Additionally, the hardware engine which encountered the error (e.g., a packet builder) might need to be utilized again to successfully complete the earlier requests for this QP (e.g., for packet retransmittal). Requests which occur after the request encountering the error (e.g., a request "D") will need to be completed with a "Flush" status indicating an earlier request encountered an error. There may be a number of requests of type "D" because the software client continues to post requests until it processes the error response. These requests might also have been in process and could also potentially encounter errors further complicating recovery.

The simplified alternative solution still fulfills the required semantics of the IB send queue. When an error occurs, firmware is interrupted. Firmware determines that this error will require an error status to be posted to the completion queue for the reque...