Browse Prior Art Database

Fast/ Efficient Data Chaining With Multi-thread Picocode

IP.com Disclosure Number: IPCOM000121886D
Original Publication Date: 1991-Oct-01
Included in the Prior Art Database: 2005-Apr-03
Document File: 4 page(s) / 137K

Publishing Venue

IBM

Related People

Purrington, CL: AUTHOR

Abstract

This article describes a technique that bridges the gap between microcode and hardware. This technique can be used to implement a flexible high-speed buffer chaining function. The function will move variable length chained data block between system memory space and device memory space. The function will reside on a single chip and require no external RAM.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 52% of the total text.

Fast/ Efficient Data Chaining With Multi-thread Picocode

      This article describes a technique that bridges the gap
between microcode and hardware.  This technique can be used to
implement a flexible high-speed buffer chaining function. The
function will move variable length chained data block between system
memory space and device memory space.  The function will reside on a
single chip and require no external RAM.

      A picoprocessor is used to control multiple data move
controllers (DMA controllers) to implement a store and forward
ping/pong scheme.  Multi-thread picocode minimizes code blocking
points, improves performance, and enables the easy integration of
other enhancements.
Required Function

      The basic function is to copy one frame of variable length
chain ed buffers (system side) into another (device side) set of
variable length chained buffers.  Frame latency must be kept to a
minimum and the design must be flexible.
Solution: Multi-Thread Code on a Picoprocessor

      The use of a picoprocessor enables the entire function to be
contained in a single chip, provides processor flexibility, and
enables closer coupling between code and hardware.

      Fig. 1 is a high-level hardware diagram.  When the ping buffer
is being unloaded by one data move controller (DMC), the pong is
being loaded by the other DMC.  When both DMCs are complete, then
roles are toggled.  The "express path" is used by the system to gain
direct read and write access to the device shared RAM.

      A task switching sequence is illustrated in Fig. 2. The code
can perform a test, switch task if blocked, and start executing the
first instruction of the next task.  The first instruction of the
other task "Task B" would likely be a test for its blocking point.
If "Task B" is blocked, then its task switching code would be similar
to that of "Task A".  Thus, if there were three tasks and they were
all blocked, the picocode would be in a poll loop waiting for work.

      With a best-case design, the performance bottleneck would be
the slowest interface (system or device).  Thus, the picocode should
be designed such that it is not the bottleneck and that it will
adjust to the interface demands. If the system is lightly loaded
which results in a fast system interface, then the system interface
should be the...