Browse Prior Art Database

NDRange mechanism for improving memory object I/O throughput Disclosure Number: IPCOM000225904D
Publication Date: 2013-Mar-12
Document File: 2 page(s) / 21K

Publishing Venue

The Prior Art Database


An NDRange mechanism for improving memory object I/O throughput is disclosed.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 2

NDRange mechanism for improving memory object I /O throughput

Disclosed is an NDRange mechanism for improving memory object I/O throughput.

Current OpenCL

          ® implementations perform memory operations using a single-threaded system memory copy. These operations include reading from a memory object to host memory, writing from host memory to a memory object, and copying from one memory object to another. Performing these memory operations in this serial manner results in idle compute units especially when a future command is dependent upon the memory operation.

OpenCL provides the ability to parallelize operations through the use of the NDRangeKernel operation. This operation allows work to be distributed across all compute units within the system. The NDRangeKernel operation is considered a peer to the aforementioned OpenCL memory operations. The same or similar underlying mechanism may be utilized for executing OpenCL NDRange kernel execution operations for performing OpenCL memory read, write or copy operations. That is, this mechanism uses multiple compute units to perform each requested memory operation. By doing this, both the throughput of the memory operation as well as the compute unit resource utilization is improved.

The mechanism works by performing OpenCL memory operations through the use of a dedicated kernel that is solely devoted to memory transfers. The dedicated kernel may be pre-built into the executing runtime or be treated as a stand-alone kernel that is loaded on demand. Ideally, this kernel is precompiled to avoid build time overhead, that is, so it is ready for execution when a memory operation is requested.

As a user requests one of the various memory operations, the OpenCL runtime prepares the dedicated memory transfer kernel's arguments accordingly. These transfer arguments consist of the source memory address, the destination memory address and th...