Browse Prior Art Database

Prefetching considering Resource Constraint of the Architecture

IP.com Disclosure Number: IPCOM000015630D
Original Publication Date: 2002-Feb-21
Included in the Prior Art Database: 2003-Jun-20
Document File: 2 page(s) / 57K

Publishing Venue

IBM

Abstract

A program is disclosed that generates prefetch instructions considering resource constraints of the target architecture. Prefetch instructions can hide latency of momory load operation. However, since an prefetch instruction also uses a load/store unit, we have to consider resource usage of the target architecture (the number of registers and utilization of functional units) to execute prefetch instructions efficiently.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 2

Prefetching considering Resource Constraint of the Architecture

A program is disclosed that generates prefetch instructions considering resource constraints of the target architecture. Prefetch instructions can hide latency of momory load operation. However, since an prefetch instruction also uses a load/store unit, we have to consider resource usage of the target architecture (the number of registers and utilization of functional units) to execute prefetch instructions efficiently.

The program generates prefetch instructions in following procedure:

In quadruple representation, prefetch instructions are inserted before every memory loads.

In quadruple representation, dataflow optimization eliminates partially redundant prefetch instructions.

DAG(directed acyclic graph) representation is generated from quadruple representation.

Code scheduling considering resource constraints[1] is applied to the DAG in following procedure:

Latency of operation, the number of required registers, and the number of required functional units are

assigned to each DAG node. A top DAG node transitively precedes all DAG nodes and a bottom DAG node transitively succeeds all DAG nodes. A level from the top to each DAG node is a maximum of summarized latencies of pathes from the top to the DAG node. A level from the bottom can be also computed similarily. A level from the top to the bottom represents the minimum latency to execute all computations of the DAG and is called a critical path length. The sequence of schueduled nodes is initialized to be an empty list. The number of used registers and the

number of used load/store units on each time is initialized to be zero. Ready nodes are inserted into the ready queue. For each DAG node in the ready queue, the earliest start time, the increase of the number of used registers

at each time, the increase of the number of used functional units at each time, and the increase of the critical path length is calculated. When the number of used registers exceeds the number of physical registers of the target architecture, a

DAG node which reduces the number of used registers is prioritized. For each DAG node in the ready queue, the length of live ranges are caluclulated that the node refers. A node is selected which reduces the usage of registers maximally and with the longest live range. When the selected node is a prefetch instruction, DAG transformation is performed in following mann...