Browse Prior Art Database

Method for an in-line computation engine in the I/O-to-memory data path

IP.com Disclosure Number: IPCOM000125748D
Publication Date: 2005-Jun-15

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for an in-line computation engine in the I/O-to-memory data path. Benefits include improved functionality and improved performance

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 16% of the total text.

Method for an in-line computation engine in the I/O-to-memory data path

Disclosed is a method for an in-line computation engine in the I/O-to-memory data path. Benefits include improved functionality and improved performance.

Background

      Conventionally, poor locality of reference is a detriment to the performance of packet processing oriented input/output (I/O) applications on general purpose microprocessors. As a result, general purpose microprocessors are moderately fast packet processors compared to the dedicated hardware, application specific integrated circuits and network processors used for high speed packet and I/O applications.

      The latency of accessing a word of data from system memory by the central processor unit (CPU) takes longer than performing simple computations, such as addition, subtraction, and cyclic redundancy code (CRC). The overhead time spent in data movement between the system memory and CPU is particularly extensive in packet processing applications. The low computation-to-communication ratio deters the application performance of the computer system severely and limits its overall throughput.

      An application with minimal computation is the checksum computation and verification of a packet received for forwarding to the network or a local storage device. After the packet is buffered in memory, the CPU accesses each word of data and computes the checksum in one of its arithmetic logical units (ALUs). This “touch and use only once” checksum computation conflicts with the use of high speed caches to compensate for memory access latency in handling frequently accessed data. Another example is expression pattern matching on data or packet headers received over a network interface.

      Conventional computer architecture stores data and instructions in memory, transfers the information into high-speed internal caches for processing by the CPU, and stores the results in memory. For I/O data, I/O device controllers fetch the data and write them to memory asynchronously. The CPU is notified about the availability of new data either through interrupts or polling. The CPU schedules the device-specific software driver for processing the data. For data transmission processing (Tx) by an I/O device controller, the CPU/software driver informs the device about the availability of new data. The device asynchronously fetches the data from memory and transmits the data over the external interface. The device informs the CPU through interrupts or polling that the Tx operation is complete so the CPU software driver can perform the required completion tasks (see Figure 1).

      The critical interconnects that facilitate data transfer among the CPU, memory, and I/O are the I/O bus, the CPU bus (system/front side bus), and the memory bus. Additionally, glue logic translates data among the protocols applicable to these buses, such as the memory controller, the memory controller hub (MCH), and the...