Browse Prior Art Database

Chaining co-processors for greater efficiency

IP.com Disclosure Number: IPCOM000203893D
Publication Date: 2011-Feb-08
Document File: 2 page(s) / 19K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a new mechanism for hardware acceleration that allows accelerators to directly communicate with each other so that the processor is freed-up to do more computing-intensive work. The flexibility of the approach allows the steps in the process to adjust to the level of the workload.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 51% of the total text.

Page 01 of 2

Chaining co-processors for greater efficiency

It is common for systems to use hardware acceleration technology to improve performance. These hardware accelerators can be built on separate Peripheral Component Interconnect (PCI) cards or integrated into processor and/or hub chips. While hardware acceleration significantly improves performance of systems, one drawback in today's systems is that a processor needs to be involved to start each hardware accelerator regardless of whether that processor is actually taking action on the data. For example, assume a system has both an extensible markup language (XML) hardware acceleration engine and a cryptography acceleration engine. The processor sends work to the XML hardware accelerator. While processing the data, the XML hardware accelerator locates data that needs to be decrypted. In today's systems, the XML accelerator has to send a message (and the data) to the processor. The processor then sends the data to the cryptography engine requesting it be decrypted. When the decryption is complete, the processor receives the decrypted data and then sends that data back to the XML engine. The processor, essentially, adds no value.

The invention is to provide a mechanism that allows accelerators to directly communicate with each other so that the processor is freed-up to do more computing-intensive work. This not only allows more efficient utilization of the processors, but also significantly reduces system latency because the processor is not acting as an intermediary. It also may be possible for the two accelerators to exchange data without writing it back to system memory.

A key piece of this invention is to allow flexibility in how the accelerators are "chained" together. This is important because not all workloads require the same processing steps. In addition, for more complex accelerators, such as XML, the accelerator may discover additional work needs to be done.

The components and steps for implementing this mechanism in a preferred embodiment are:
1. Create a data structure that identifies the chain of accelerators by which this data should be processed before being sent back to the processor. This data structure is initially populated by the processor before kicking off the first accelerator.

One example of the data structure might be a link list of messages; each message has the information each accelerator needs to do its processing. For example, if the data coming in needs to be decompressed, decrypted, sent to the XML engine for processing, and then back to the processor, the data structure...