Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

A High-Speed, Sum Addressed Content-Addressable Translation Cache.

IP.com Disclosure Number: IPCOM000016766D
Original Publication Date: 2003-Jul-14
Included in the Prior Art Database: 2003-Jul-14
Document File: 5 page(s) / 57K

Publishing Venue

IBM

Abstract

Recent microprocessor address memory using a base, offset scheme is associated with a "load" or "store" (or any similar instruction) are usually two fields (called operands) which represent a pair of registers -- a base register, and an offset register. These registers are added together in order to create what is commonly known as an effective address(EA). The EA is then further translated through the use of an SLB, TLB, hash look-up, etc. to generate a real address(RA). The RA is then used to address memory (L2,L3,system memory, etc.). The process of generating an RA from an EA is know as translation. Since translation is a costly operation (> 10 cycles), microprocessors usually include a translation cache (or table), know as Effective to Real Address Translation Cache (ERAT) . Once the EA is known, it can be quickly looked up in the ERAT. In practice, though, computing an EA from the two operands is costly as well, since a high-speed adder is required to compute the EA. There is also the need to use the ERAT in special ways. For instance there are some instructions which require bits, other than EA, to be compared within the ERAT as well. They might be valid bits, cache protection bits, thread information bits, etc. They would be treated as special miscellaneous bits. In some cases the ERAT may be used to partially clear entries which are no longer required to be valid in the table. Since computing the sum of the two operands that generate an EA is difficult, an ERAT which implements a method for quickly adding-comparing is needed. The ERAT must also have a way of partially comparing bits, as well as "classically" (no special sum method; just simply the bits) comparing, and a way to make entries in the table invalid.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 41% of the total text.

Page 1 of 5

A High-Speed, Sum Addressed Content-Addressable Translation Cache.

The current ERAT structure incorporates the following behaviors:

a way to compute whether the sum of two operands (RA and RB) match an entry in

the table without actually having to compute the sum of the two operands. This is called structure a Sum-Address Contents Addressable Memory (SACAM), the ability to "classically" compare status, thread, and other miscellaneous bits within


1.


2.


3.

The following sections describe the how the solutions are implemented.

Section 1: SACAM

A fast method for computing if the sum Ra+Rb is equal to a value Km, where Ra, and Rb are operands to the CAM structure and Kn are entries in the ERAT, is needed.

For a given Ra, Rb, and Km, the following equations are equivalent:

Ra + Rb = Km (1) Ra + Rb - Km = 0 (2) Ra + Rb + /Km + 1 =0 (3) Ra + Rb + /Km= -1 (4)

where -1 = the all ones vector, and /Kn = "K bar", the "ones" complement of K.

It is commonly known that for binary numbers, A, B, and K the sum A+B+K can be converted to a set of vectors S, and C , such that the sum S+C is equal to the sum A+B+K, through a process known as 3 to 2 compression. The compression process can be done without having to compute the propagation of carry. When can therefore convert (4) above to:

Ra+Rb+/Km=-1; S+C=-1 (5)

without propagating carry.

Also, in order to compute the quality "S+C=-1", it must be detected that for every bit Sn, the bit Cn does not match. In other words:

S + C = -1 ===> S=/C (6)

The structure for a CAM can be envisioned for every entry M:

the table, and a method for making entries invalid.

1

Page 2 of 5

      A0 B0 /K0,m A1 B1 /K1,m A2 B2 /K2,m An Bn /Kn,m
Carry,n+1

  ||| ||| ||| ||| |
*-+---+---+-* *-+---+---+-* *-+---+---+-* *-+---+---+-* |

| 3to2 | | 3to2 | | 3to2 |...| 3to2 | |

|compressor | |compressor | |compressor | |compressor | |
| | | | | | | | |
*--+-----+--* *--+-----+--* *--+-----+--* *--+-----+--* |

|| || || || |
C-1 S0 C0 S1 C1 S2 ... Cn-1 Sn Cn

\/ \/ \/ \ /

 \/ \/ \/ \ /
(XOR) (XOR) (XOR) (XOR)
| | | |
*----------------------------------------------------*

| AND |
*-----------------------+----------------------------*

|

HIT,M

where Carryn+1 can be chosen at an intermediate point of what would have been the computation of the sum S+C. For instance, if only bits (0:51) are needed for a match, one only needs to calculate Carry52 (but not any of the intermediate sum or carry of any other bit.) Carryn+1 can be chosen so that its computation can be completed before the other compression-xor-and portion of the CAM is completed. Notice that if all the bits of vectors Ra, Rb, and K are used, Carryn+1 will be set to 0.

The Hit signal of structure (7) is commonly precharged to be on and the discharged when none of the intermediate matches. Therefore, the XOR- AND section of (7) can be equivalently shown as:


(7)

(XNOR) (XNOR) (XNOR) (XNOR)
| | | |
*----------------------------------------------------*

| NOR |


(8)

*-----------------------+--------------...