Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method for Handling Page Boundary Crossings Encoundered by a Hardware Accelerator

IP.com Disclosure Number: IPCOM000200051D
Publication Date: 2010-Sep-24
Document File: 5 page(s) / 36K

Publishing Venue

The IP.com Prior Art Database

Abstract

In this article we propose mechanisms for communication between the general purpose processor and performance and power efficient coprocessors. These mechanisms describe initiating work on the coprocessor, sending pointers to source and destination data, and, handling page boundary crossings.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 21% of the total text.

Page 01 of 5

Method for Handling Page Boundary Crossings Encoundered by a Hardware Accelerator

In order to increase system performance beyond what can be achieved using conventional processor architectures some new designs are migrating toward the use of heterogeneous computing systems, sometimes referred to as hybrid computing systems. These systems consist of coprocessors or coprocessors, in addition to more conventional general purpose processors. In addition to the performance benefit of freeing up the general purpose processor upon offloading work to these auxiliary units, system performance may also benefit in reduced latency for completion of these accelerated tasks.

However, there are several issues which must be efficiently resolved to get the most benefit form offloading work to coprocessors. First, the coprocessor must be informed of the location of the source data, input parameters and the destination data. Second, the correct coprocessor must be initiated to start the work. Thirdly, if the coprocessor hits a page boundary, It must be able to continue to the next effective page by figuring out the correct physical page address to read or write.

In this article we propose mechanisms which use existing instructions in an Instruction Set Architecture (ISA) to accomplish the three communication tasks identified above. Details of how the software drives these coprocessors are hidden from application code, which can issue calls to standard functions like "block copy", "block move", "block init", standard string ops, and many other functions that may be offloaded. The standard library of functions is replaced by a library of equivalent functions that use the concepts described in this article to interface to these coprocessors.

We organize the article into three sections, each of which addresses one of the issues.

Sending Pointers to Coprocessor

A "data cache block touch" instruction (dcbt in PowerPC) is used to point to source data. The dcbt is differentiated from other uses of the same instruction via the way the address touched is mapped in the page table. Effective address space is duplicated, with one half of the address space being used to support this function. One most-significant (MS) bit of the address indicates whether this is a normal dcbt or a special dcbt. Both spaces map into the same page in the page table to avoid excessive requirements for TLB entries. If this bit is set, the touch is sent as usual to the memory subsystem, but it is tagged to not be returned to the cache. Instead, it is held in the coprocessor, waiting for a subsequent initiation of a compatible coprocessor by the same thread. The coprocessor will use a programmed block size to determine how much data to consume, starting at the address pointed to by the dcbt. An alternative to using a MS address bit to distinguish a special dcbt from a normal dcbt would be to use an alternate instruction encoding.

Likewise, a special "data cache block zero" (dcbz in Po...