Dismiss
InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Method for extending processor L2 cache in CMP systems

IP.com Disclosure Number: IPCOM000008010D
Publication Date: 2002-May-10
Document File: 3 page(s) / 332K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for extending processor level-2 (L2) cache in chip multiprocessor (CMP) systems. Benefits include improved performance and lower power.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 48% of the total text.

Method for extending processor L2 cache in CMP systems

Disclosed is a method for extending processor level-2 (L2) cache in chip multiprocessor (CMP) systems. Benefits include improved performance and lower power.

Background

              CMPs are an efficient way to use the millions of transistors on a chip made available by the conventional process technology. The simplicity of the CMPs is that they use the same processor instantiated multiple times on the die. This configuration results in faster, more efficient symmetric multiprocessor (SMP) systems. CMP architecture functions well with thread level parallelism (TLP), which is abundant in commercial server workloads.

              The conventional L2 cache scheme requires the L2 cache to allocate a replacement line to accommodate the data returned from the chipset as a result of an L2 miss.

              In conventional shared memory applications, dirty cache lines may simply ping-pong from processor to processor resulting in the replacement of L2 cache lines in the processor demanding the data, and invalidating the data in the other processors. This procedure results in writeback of data to memory and increased traffic on the bus. Even sharing of clean cache lines can result in replacement in the L2 cache of the demanding processor, and may possibly result in writeback to memory and additional traffic to bring the line back, if needed.

General description

               The disclosed method includes a mechanism, the Global Cache-line Replacement Manager (GCRM) that effectively extends the processor L2 cache by using the other available on-chip L2 caches in a CMP. This solution handles all available L2 caches as a single cache without dealing with the complexity of having to redesign a unified L2 cache that services all the processors in the CMP.

Advantages

              The disclosed method provides several advantages, including:

§         Improved performance due to:

-         Retention of more useful data on local or extended caches

-         Reduced accesses to memory

-         Reduced effective memory latency and system bus traffic

§         Improved power consumption due to the use of the L2 cache in the other processors even if the processor cores are powered down due to underutilization (when the design enables the independent powering of the L2 cache and the core in the processor)

§         Cost effectiveness due to the achievement of the same performance as a larger cache but using smaller distributed caches and a smaller die

Detailed description

              The disclosed method adds a hardware unit to a base CMP system (see Figure 1). For example, four processors share a bus, which provides the communication via the chipset to memory and other I/O devices. The arbitration protocol, request, error, and data phases continue to be conventional. However, because the four processors are now on-chip, the snoop mechanism of the bus can be replaced with a fast cache-coherency logic that runs at processor core speeds rather than the FSB speed. This capability provides quick access to the contents of the other on-chi...