Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method for Determining the DMA Descriptor Containing Page Fault Address Caused by Bit Flip Error

IP.com Disclosure Number: IPCOM000239354D
Publication Date: 2014-Nov-01
Document File: 3 page(s) / 48K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for determining the Direct Memory Access (DMA) descriptor containing a page fault address caused by a bit flip error.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 3

Method for Determining the DMA Descriptor Containing Page Fault Address Caused by Bit Flip Error

An input/output (I/O) adapter, such as a Fibre Channel adapter in a DS8K storage controller, is typically attached to the server in the storage controller via a Peripheral Component Interconnect (express) PCIe bus with PCIe switches in the data path. These PCIe switches are susceptible to errors that have the ability to randomly flip bits in the address translation registers, causing the switches to start routing data accesses to an incorrect address range, causing errors to be detected by the server that is being accessed with the incorrect addresses. When the server detects an error caused by a bad address being used to access its memory, the error it signals is called a "page fault".

An I/O adapter processes host I/O by transferring the data to/from the memory within the adapter to/from the memory in the server into which it is installed (and connected to via a PCIe bus). The data is transferred using a Direct Memory

Access (DMA) engine that is operated using DMA Descriptors (DDs), which are software structures residing in adapter

memory and constructed by the Central Processing Unit (CPU) in the I/O adapter. The DDs describe the source address from which the data is to be read, the target address to which the data is to be written, and the size of the data to be transferred. The DDs are linked together to create "DD chains", which allows the I/O adapter CPU to submit several DDs to the DMA engine at once, by pointing the DMA engine to the start of the chain, and the DMA engine processes all DDs that are linked together. To access the server memory, the data requests go across the PCIe bus that attaches the I/O adapter to the server. An I/O adapter may have multiple DMA engines within it, allowing for multiple data accesses being performed in parallel.

In the normal case, the data transfers performed by the DMA engine as described by the DDs it is processing do not cause any errors, because all accesses are within the acceptable address ranges with which the PCIe bus hardware is configured. However, when a bit flip occurs in the PCIe switches, the addresses presented to the storage controller server on the other end of the PCIe bus cause page faults. In these situations, it is very difficult to determine the source of the bad address, since the DDs in the I/O adapter are all being correctly constructed. Additionally, the address that

was captured by the server hardware when the page fault was detected does not match any addresses in the DD chains

on which the DMA engines are working at the time, because the PCIe switch as flipped a single bit somewhere in the 32 or 64 bit address.

There are no known solutions to this problem.

The disclosed solution addresses the problem of determining the data access that caused the page fault. This is usef...