Browse Prior Art Database

Method for a nonblocking copy engine using SMT and SOEMT threads

IP.com Disclosure Number: IPCOM000028868D
Publication Date: 2004-Jun-04
Document File: 6 page(s) / 21K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for a nonblocking copy engine using simultaneous multithreading (SMT) and switch-on-event multithreading (SOEMT) threads. Benefits include improved functionality and improved performance.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 55% of the total text.

Method for a nonblocking copy engine using SMT and SOEMT threads

Disclosed is a method for a nonblocking copy engine using simultaneous multithreading (SMT) and switch-on-event multithreading (SOEMT) threads. Benefits include improved functionality and improved performance.

Background

              Conventionally, memory copying is one of the major contributors to the stalls and overall processing time of several desktop and server workloads. During a memory copy, the processor waits for the memory read or write.

              Copying buffers from one memory location to another is one of the major time consuming events in server and desktops workloads. Memory copy (source, destination) is usually converted into a repeat move (REP MOV) instruction in conventional implementations. This processing typically blocks subsequent computations by the central processing unit (CPU) until all the source (src) contents are copied into the destination (dst).

              In receive-side transmission control protocol/internet protocol (TCP/IP) processing, the network interface card (NIC) writes packets into its buffers in system memory using direct memory access (DMA). When the packet is in memory, the receive-side processing is initiated by interrupting the CPU. Part of the network processing, is to copy the NIC buffers into the application buffer. The TCP processing occurs in the kernel. After completion (t_copy), a signal is sent to the application to inform it of the availability of the buffer. After a time delay (t_AppDelay), the application reads the buffer and the NIC buffer is recycled after the copy (see Figure 1).

General description

              The disclosed method is memory copying without blocking. The memory access latency is overlapped with useful computation.

              The key elements of the disclosed method include:

•             Use of an asynchronous facility, such as a data mover engine (DME), a second hardware thread, or a core to implement the DME

•             Nonblocking (asynchronous) memory copy operation

•             Fencing mechanisms to prevent memory reading/writing by the application until the source and destination buffers have been completely used

Advantages

              The disclosed method provides advantages, including:

•             Improved functionality due to enabling nonblocking memory copying

•             Improved performance due to overlapping the latency with useful computation

Detailed description

              The disclosed method provides a nonblocking memory copy engine so that CPU cycles can perform useful computations rather than wait for the memory copy to complete.

              The fencing mechanisms require the following support:

•             Subsequent reads from the destination are blocked until the copy is complete.

•             Subsequent writes to source are blocked until the copy is complete.

              The benefits of the nonblocking mechanism include:

•             CPU is used to signal and schedule applications, w...