Browse Prior Art Database

Method for tolerating longer hit latency in the presence of larger mid-level caches with software cooperation and smaller single-cycle caches

IP.com Disclosure Number: IPCOM000033183D
Publication Date: 2004-Nov-30
Document File: 3 page(s) / 46K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for tolerating longer hit latency in the presence of larger mid-level caches with software cooperation and smaller single-cycle caches. Benefits include improved functionality and improved performance.

This text was extracted from a Microsoft Word document.
This is the abbreviated version, containing approximately 55% of the total text.

Method for tolerating longer hit latency in the presence of larger mid-level caches with software cooperation and smaller single-cycle caches

Disclosed is a method for tolerating longer hit latency in the presence of larger mid-level caches with software cooperation and smaller single-cycle caches. Benefits include improved functionality and improved performance.

Background

              Platforms based on low-power microprocessor technology typically have 16 KB or 32 KB of mid-level cache. The loads from this cache have a latency of 3 cycles before the values stored there are available. As a result, the processor is stalled for 2 cycles if the operation tries to use the value that is being loaded. Analysis indicates that this mid-level load/use penalty dominates the processor cycles with stalls as high as 18-24% of the total number of cycles.

              Several techniques exist for reducing the cache hit time, including:

•             Smaller and simpler caches

•             Avoiding address translation during cache indexing

•             Pipelined cache access

•             Trace cache

              These techniques are optimized for general purpose, random access patterns. Larger cache sizes are implemented to alleviate the growing memory bottlenecks. Managed runtime environments, which generate code on the fly, do very little code-scheduling and are unable to tolerate the latencies. With the use of prefetch mechanisms in software, the hit times to the cache remain the same. Prefetch compensates for some of the memory latency and brings data to the cache.

              To prevent the pollution of instructions and data, systems have conventionally implemented distinct instruction and data caches. However, no special mechanisms have been implemented for quickly caching designated access patterns, such as heap accesses, stack accesses, and special data-structure accesses.

General description

              The disclosed method accommodates longer mid-level cache hit latency in the presence of bigger mid-level caches using software cooperation and smaller single-cycle caches. The method is backward compatible with existing systems but the performance benefits are higher if the software is modified to describe the type of accesses that are the biggest bottlenecks.

              The method uses separate on-chip caches that are different from the mid-level data cache and are coherent with memory or high-level caches. Accesses to the on-chip caches have single-cycle latency. These caches can be used in several different ways:

•             Faster cache to the mid-level cache

•             Cache accesses to a thread stack data (stack cache)

•             Managed runtime heap (frequently-used heap cache)

•             Software application designation of the type of accesses to be performed using the on-chip cache

Advantages

              The disclosed method provides advantages, including:

•             Improved functionality due to enabling a co...