Browse Prior Art Database

Method for a systolic cache microarchitecture Disclosure Number: IPCOM000007095D
Publication Date: 2002-Feb-26
Document File: 4 page(s) / 138K

Publishing Venue

The Prior Art Database


Disclosed is a method for a systolic cache microarchitecture. Benefits include improved performance and improved functionality

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 48% of the total text.

Method for a systolic cache microarchitecture

Disclosed is a method for a systolic cache microarchitecture. Benefits include improved performance and improved functionality.

General description

              The disclosed method creates an efficient systolic cache architecture and microarchitecture  (SCMA). The method includes an apparatus that shares the microarchitecture of conventional microprocessor designs and creates significant speedup of systolic algorithms with no negative impact on traditional overall design tradeoffs. Additional functionality improves performance on computing applications that can be sped up by systolic computation methods. Graphics, encryption/decryption, and audio applications are among applications that benefit most from the systolic algorithms and the implementations.

              The systolic cache apparatus may be built from standard instruction/data caches. They may be a separate cache, set of caches, or some combination of standard and separate caches. The disclosed method couples these caches with one or more state machines to implement systolic processing in standard microprocessor designs without adding major new hardware elements. The cache elements serve both to buffer state and to behave as crossbar switches.

              The essential elements of the disclosed method include:

•            The use of microprocessor cache arrays to perform the following:
              -             Store and buffer output from one execution unit
              -             Provide input to the next execution unit(s) in the systolic array
              -             Connect the input of any one execution unit to the output of any other execution unit
                            (such as a crossbar switch) in an extremely efficient manner
•            The systolic update state machine (SUSM) control hardware required to implement the
              systolic data forwarding rules and to step the systolic updates of the array of systolic
              processor elements (PEs)


              SCMA improves the performance of computing applications that contain parallel algorithms, such as simulations of natural systems. Examples include hydrodynamics, airflow over wings, plasmas, and weather. Local computations of graphics such as edge enhancement also belong to this class. Modern encryption algorithms can be effectively implemented by systolic arrays of processors.

              Within microprocessors, the inclusion of special hardware to implement systolic computation improves performance. Capitalizing on prefetch knowledge and deep pipelining permits improved microprocessor performance on many applications with very little hardware overhead.

Detailed description

              SCMA functional requirements include:

•             Ability to extract cache data into a data stream flowing through one or more PEs

•             Ability to write a data stream output by one or more PEs back into the cache data

•             Ability of the PEs and control to extract data and write a data stream in parallel

•             New instructions, both architectural and microarchitectural to reset the PEs, to alter the systolic algorithm, and to step the systolic computation

              Nearly all the hardware needed to implem...