Browse Prior Art Database

Dual Load/Store Unit with a Single Port Cache

IP.com Disclosure Number: IPCOM000116075D
Original Publication Date: 1995-Aug-01
Included in the Prior Art Database: 2005-Mar-30
Document File: 6 page(s) / 176K

Publishing Venue

IBM

Related People

Elliott, TA: AUTHOR [+4]

Abstract

Although load instructions are one of the most prevalent instruction groups within a RISC microprocessor, very little has been done to improve their execution rate. In addition, providing the load data to the execution units as early as possible may eliminate downstream data dependencies which slow down the processor.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 36% of the total text.

Dual Load/Store Unit with a Single Port Cache

      Although load instructions are one of the most prevalent
instruction groups within a RISC microprocessor, very little has been
done to improve their execution rate.  In addition, providing the
load data to the execution units as early as possible may eliminate
downstream data dependencies which slow down the processor.

      Background Part 1 - The PowerPC* 604 microprocessor ("604")
represents the most recent example of an unbalanced processor.  Three
separate fixed points and one floating point are fed through a single
load store unit.  With each fixed point using up to two operands
each, and the floating point using up to three operands, you can see
how a single load/store unit becomes a machine bottleneck.  The basic
dataflow for a load/store unit can be found in Fig. 1.

      Background Part 2: (dual independent load store units with a
two-ported cache) - The timing chart in Table 1 and Table 2 show part
of the importance of a dual load/store design.  Table 1 shows 8 loads

(L1 - L7) being sent through a single load/store unit.  In Table 2,
the same 8 loads are sent through a dual load/store unit.
                 0    1    2    3    4    5    6    7    8    9
  Dispatch      L0   L1   L2   L3   L4   L5   L6   L7
  EA / cache         L0   L1   L2   L3   L4   L5   L6   L7
  Cache / DIU             L0   L1   L2   L3   L4   L5   L6   L7

Table 1: Single Load/Store Unit Instruction Timing
                  0    1    2    3    4    5    6    7    8    9
  Dispatch0      L0   L2   L4   L6
  Dispatch1      L1   L3   L5   L7
  EA0 / cache0        L0   L2   L4   L6
  EA1 / cache1        L1   L3   L5   L7
  Cache0 / DIU0            L0   L2   L4   L6
  Cache1 / DIU1            L1   L3   L5   L7

Table 2: Dual Load/Store Unit

      The other important factor in gauging the performance
improvement of a dual load/store design would be the elimination of
downstream load dependencies in the execution units.  Without going
into great detail, it should be fairly obvious that the earlier load
data is available, the less likely execution units using that data
will be stalled.

      Two major obstacles have prevented most desktop and server
microprocessors from using a dual load store design:
  1.  Expense of dual porting a large data cache - By going to a
       two-port cache, not only will there be a substantial increase
in
       the physical cache size, but the read and write access times
will
       also increase.  In many of the more recent designs, the delay
       through the large caches have been the limiting frequency of
the
       processor.
  2.  How to maintain coherency with two independe...