Browse Prior Art Database

Method for store-load forwarding in a single stage with store prevalidation

IP.com Disclosure Number: IPCOM000033800D
Publication Date: 2004-Dec-28
Document File: 3 page(s) / 538K

Publishing Venue

The IP.com Prior Art Database

Abstract

Disclosed is a method for store-load forwarding in a single stage with store prevalidation. Benefits include improved functionality and improved performance.

This text was extracted from a Microsoft Word document.
At least one non-text object (such as an image or picture) has been suppressed.
This is the abbreviated version, containing approximately 54% of the total text.

Method for store-load forwarding in a single stage with store prevalidation

Disclosed is a method for store-load forwarding in a single stage with store prevalidation.   Benefits include improved functionality and improved performance.

Background

              Conventionally, speculatively executed stores in an out-of-order processor are tracked in the StoreQ structure until they retire and drain into caches or merge buffers. StoreQ is allocated in program order in the front end but written with physical address and data in any order. In-order processors have similar structures, store buffers, that are usually fewer in number because they contain buffers for post-retirement stores. When a load encounters an older store in the StoreQ, data must be forwarded from the more recent StoreQ entry rather than the stale data in the cache for correctness. When a load encounters multiple entries in the StoreQ, data must be forwarded from the most recent overlapping store older than the load (the youngest of all overlapping stores older than the load). But this situation is difficult to detect in a single stage because it requires a prioritization network between the various hits.

              Previous implementations reprocessed multiple hits or prioritized them within subsections of the StoreQ, which resulted in not-so-perfect store-load bypassing.

              The conventional approach to store-load forwarding is to ignore multiple hits by taking the full processing penalty of flush and replay. This approach has significant performance loss when cores get bigger and applications tend to have multiple outstanding updates to the same locations.

              Alternative approaches divide the StoreQ into clusters and prioritize multiple hits between clusters. This solution is neither perfect nor cheap. Prioritization is in the critical data path. This solution is only scalable by impacting performance due to processing additional clusters.
General description

              The disclosed method is store-load forwarding in a single stage with store prevalidation. An age range is set for every store to identify all the overlapping loads that can lega...