Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Optimal Register Permutations on a Stack Machine

IP.com Disclosure Number: IPCOM000114051D
Original Publication Date: 1994-Nov-01
Included in the Prior Art Database: 2005-Mar-27
Document File: 4 page(s) / 115K

Publishing Venue

IBM

Related People

Kravitz, SA: AUTHOR [+3]

Abstract

In a stack machine with an "exchange with top of stack" (XCHNG) instruction, it is desirable to permute the elements of the stack with the minimum number of exchange instructions. This problem is of importance when managing the floating-point register stack of the i80x86 family floating point processors.

This text was extracted from an ASCII text file.
This is the abbreviated version, containing approximately 50% of the total text.

Optimal Register Permutations on a Stack Machine

      In a stack machine with an "exchange with top of stack" (XCHNG)
instruction, it is desirable to permute the elements of the stack
with the minimum number of exchange instructions.  This problem is of
importance when managing the floating-point register stack of the
i80x86 family floating point processors.

      A exemplary stack machine possesses n registers r&sub1., ...,
r[n] The top-of-stack register is r[1].  The only
register-to-register data movement operation provided by the machine
is exchange-with-top-of-stack (XCHNG).  XCHNG r[i]  replaces r[1]
with the value of r[i], and vice versa.

      A permutation of the registers is defined by a permutation
vector P[1], ...  P[n].  P[i] is the intended destination of the
value in r[i].  A resister dependency graph of a permutation draws an
edge from r[i] to r[j] when j = P[i], i.e., when the value
currently in r[i] should end up in r[j].  The graph and the
permutation vector have identical information content.

      The straightforward permutation algorithm (Fig. 1) successively
moves each register r[i] = / r[1] whose location is not equal to
P[i], starting with r[n], to the top of the stack with an XCHNG and
then moves it to P[i].  Notice that an XCHNG inserts an exchange
instruction into the instruction stream and makes the corresponding
changes in the dependence graph.  This algorithm is linear, but
requires at most 2n - 3 exchanges for n registers.  This bound is due
to bringing each register to the top of the stack once, except the
last two registers which need at most one exchange.  The worst case
performance occurs in practice for some permutations that are
rotations.

      Note that every node of a register dependency graph has in- and
out-degree exactly 1, since each register holds precisely one value
and every value must sit somewhere.  If i = P[i], this defines a
single node cycle, i.e., the value in ri should end ip in r[i].  In
the general case, the dependency graph consists of 1 to n cycles and
the cycle which contains R1 is referred to as the 1-cycle.  At the
end, the dependency graph consists of n single node cycles
(self-cycles).

XCHNG and the Register Dependency Graph

      After each exchange the P vector and the register dependency
graph are updated.  There are four important properties:
  o  By definition, an exchange always involves a register that is
      mapped to r&sub1.  (a member of the 1-cycle) and another
      register.
  o  Exchanging with another register from the 1-cycle divides this
      cycle into two cycles, one of which is the new 1-cycle.  If the
      register is a successor to r[1]  in the evolving dependency
graph
      (the register is P[1]) then one of the cycles created is a
      self-cycle.  The inefficiency of the straightforward algorithm
      (Fig. 1) is due to the fact that it can split the 1-cycle i...