Browse Prior Art Database

Method for a low-latency data pipeline with a data merge buffer for high-level cache

IP.com Disclosure Number: IPCOM000018665D
Publication Date: 2003-Jul-30
Document File: 3 page(s) / 69K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for a low-latency data pipeline with a data merge buffer for high-level cache. Benefits include improved performance and improved design flexibility.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Method for a low-latency data pipeline with a data merge buffer for high-level cache

Disclosed is a method for a low-latency data pipeline with a data merge buffer for high-level cache. Benefits include improved performance and improved design flexibility.

Background

              A high-level cache (level-3 and above) typically has a large capacity but low utilization because the request is fulfilled by the lower level caches a majority of the time. Tag lookup requires multiple cycles. Data access requires a greater number of cycles. The look-up time must account for sending request information to a large area and getting back the results. Many tag lookups involve at least 1 cycle for flight time, and data lookups involve multiple cycles for flight time.

              Conventional design theory includes the concept that the latency observed by all the accesses should be uniform.

General description

              The disclosed method is a cache design that reduces the overall latency of the tag pipeline of a high-level cache. The cache is organized in two or more sections to provide early response for a subset of the requests, resulting in an overall improved access latency.

Advantages

              The disclosed method provides advantages, including:

•             Improved performance due to improved response (reduced latency) to a subset of requests because the cache is organized in two or more sections

•             Improved design flexibility due to the capability to be generalized into multiple sections

Detailed description

              The disclosed method includes atag pipeline with overall lower latency. For example, the conventional tag lookup of a high-level cache requires at least one cycle to reach all the elements of the tag array. This delay introduces an additional cycle to the latency of the high-level cache. The individual elements of the cache can be reorganized into two sections, a section that provides tag look-up results early and a section that takes additional cycle(s) to complete the lookup. The results of the tag lookup arrive early or late, depending on which section provides the final result.

              The division of the tag array can be made across sets or across ways. If the division is made across sets, all the ways of a certain range of sets are in one half and the remainder in the other half. The tag comparator and ECC related logic must be replicated in both the halves. If the division is made across ways, all the sets of a certain range of ways are in one half and the remainder in the other half. In both cases, the ECC detection and correction logic can be after the point where the two reads merge.

              When a data array is split into...