InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

A novel prediction mechanism for accelerator exectution time

IP.com Disclosure Number: IPCOM000203251D
Publication Date: 2011-Jan-21
Document File: 4 page(s) / 58K

Publishing Venue

The IP.com Prior Art Database


Integrating on-chip accelerators is one of important trends in high performance CPU design. And to improve the concurrency and utilization, accelerators are usually shared by multiple tasks. In reality, the task that issues requests to accelerator does not know the execution time for the job since the accelerator may serve other requests from multiple tasks. So, it is very important to know how much time each task will wait for the acclerator, so that the application (or OS scheduler) could switch to other jobs rather than busy waiting. An method and apparatus are proposed to provide an interface to the software application with the information of estimated wait time and execution time for a request to hardware accelerators.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 53% of the total text.

Page 01 of 4

A novel prediction mechanism for accelerator exectution time

Hardware accelerator

In an typical hardware accelerator implementation, the accelerator unit is designed to execute a fixed function. To support multi-task, a queue is provided to store the requests from different tasks, as shows in Figure 1. There may be one or multiple output buffer associated with the accelerator. Or in most cases, the output buffer is specified in the request.

(This page contains 00 pictures or other non-text object)

Figure 1. Typical hardware accelerator implementation with multi-task support

Time Estimation

1. Estimate execution time for each request

Generally, it is quite accurate to abstract the hardware accelerator as one black box which run at a fixed speed. The varying time is decided by the input/output data location. For example, the input data might locates in on-chip cache, or out-chip memory. Such locations might cause different access time. On the other hand, Hardware is easy to know the total data size. Thus, the hardware can estimate the execution via cache


_rate * total

_size *



2. Estimate the waiting time


Page 02 of 4

For each hardware accelerator request in the queue, compute the estimated execution time using above section, then add all the estimated executed time. The flow chart is as shown in Figure 2.

(This page contains 00 pictures or other non-text object)

Figure 2. The flowgraph of time estimation


Page 03 of 4


There are many different possible interface designed for the hardware to feedback the estimated time to the application. We give an example of possible (but not limited to) interface.

When a task submits a request to a hardware accelerator, it may provide the function id of specified ac...