Browse Prior Art Database

Two-level circular queue to facilitate dynamic task migration in a stream computation model

IP.com Disclosure Number: IPCOM000212351D
Publication Date: 2011-Nov-08
Document File: 4 page(s) / 95K

Publishing Venue

The IP.com Prior Art Database

Abstract

A stream programming model is powerful for developing applications on multi-core architectures because of its explicit parallelism, which makes stream programming highly valuable in this multi-core era. A traditional stream programming model leverages static compilation, workload partition and mapping techniques, in which two tasks in a stream graph are statically assigned to one same thread or two different threads, and connected with either an intra-thread buffer or an inter-thread buffer. Such a mechnaism prevents a stream program binary from being dynamically scheduled onto a multi-core platform. To solve this problem, this disclosure proposes a new two-level circular queue structure, which can work either in inter-thread mode or in intra-thread mode while suffering from no expensive mode-switch cost. With the support of this queue structure, a stream program binary can be dynamically partitioned and mapped onto a given number of system cores.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 41% of the total text.

Page 01 of 4

Two-level circular queue to facilitate dynamic task migration in a stream computation model

A stream programming model is powerful for developing applications on multi-core architectures because of its explicit parallelism, which makes stream

programming highly valuable in this multi-core era.

   A synchronous data flow programming model, e.g. StreamIt, usually depends on static task scheduling algorithms to mapping all tasks in a stream graph onto many worker threads to achieve good workload balance and scalability. However, to use static scheduling, a stream compiler assumes the computation complexity of each task can be statically resolved. This assumption does not hold in many real cases, e.g. in many applications, the iteration number of a loop body is decided by run-time input data, which can hardly be predicted during compilation. In such cases, an optimal static scheduling plan can never be found. Therefore, static scheduling restricts the wide application of synchronous data flow programming models.

   As an alternative, dynamic scheduling can help to handle the un-predictable task scheduling problem. By using dynamic scheduling, a stream compiler does not do any task complexity estimation, and generates no static scheduling plan. On the contrary, a dynamic scheduler collects run-time profiling information, generates scheduling plans and schedules tasks during run-time. Therefore, even complexity of tasks vary according to input data, the dynamic scheduler can adaptively map workloads onto worker threads to achieve workload balance.

Here comes the problem. Since task assignment to worker threads occurs at runtime, we do not statically know if two connected tasks will communicate

inter-thread or intra-thread. Therefore the design of a connection between two connected tasks becomes challenging. Fundamental requirements for such a connection are as following:


1) The connection should support low-overhead data communication in both inter-thread and intra-thread cases,

   2) The connection should not introduce significant overhead when switching from its inter-thread communication mode to the intra-thread communication mode, or vice versa, and


3) The connection mechanism should not introduce high run-time cost.

A straight-forward solution is to create two physical connections during compilation, one for inter-thread and the other for intra-thread connections. This

can lead to redundant memory allocations which require cleanup at runtime.

   Another approach is to let the run-time analyze the task assignment, resolve the communication mode of each connection, and then create all connections, which will make the run-time complex and heavy-weight.

To provide a low-cost, inter/intra-thread adaptive connection while keeping the run-time simple and light-weight, we invent a two-level circular queue (TCQ) data structure. By using such a data structure, no redundant connection needs to be created separately for the two communication mode. More o...