Browse Prior Art Database

Efficient Link-Time Heuristic for Reducing Split Branches on IA-64 Architecture

IP.com Disclosure Number: IPCOM000014300D
Original Publication Date: 2000-Jul-01
Included in the Prior Art Database: 2003-Jun-19
Document File: 3 page(s) / 70K

Publishing Venue

IBM

Abstract

The IA-64 architecture assigns 21 bits to the offset field of a branch instruction, restricting branch targets 21 to a distance of 2 bytes from a call site. In large program modules, it is not uncommon for branch offsets to exceed the 21 bit limit, forcing the compiler and/or the linker to split a single long branch into several smaller branches whose target offsets are within the architectural limit. In some cases, these additional branches can have a negative impact on program run time, especially if they are nested deeply in an inner loop. It is, therefore, imperative that a compilation system targetting the IA-64 architecture take steps to minimize the number of program branches which require splitting.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 54% of the total text.

Page 1 of 3

Efficient Link-Time Heuristic for Reducing Split Branches on IA-64 Architecture

The IA-64 architecture assigns 21 bits to the offset field of a branch instruction, restricting branch targets 21to a distance of 2 bytes from a call site. In large program modules, it is not uncommon for branch offsets to exceed the 21 bit limit, forcing the compiler and/or the linker to split a single long branch into several smaller branches whose target offsets are within the architectural limit. In some cases, these additional branches can have a negative impact on program run time, especially if they are nested deeply in an inner loop. It is, therefore, imperative that a compilation system targetting the IA-64 architecture take steps to minimize the number of program branches which require splitting.

It is assumed here that intra-section branches i.e. those branches whose targets lie in the same compiler-generated text section as their associated branch, will be split by the compiler, as necessary. Inter-section branches, however, cannot be split by the compiler, as the relative offsets of unique text sections are not determined until link time. It is the splitting of inter-section branches that is the topic of this disclosure.

Given a set of relocatable, compiler-generated text and data sections, it is the responsiblity of the linker to produce a program image that can be loaded by the system loader and executed on the system's processor. While producing a loadable image, the linker assigns addresses to text and data sections and the procedures and data variables that lie within them. Inter-section references are then relocated using the linker-assigned address of the referenced object. Given a native address assignment algorithm, one which makes no attempt to minimize split branches during relocation, it is nearly certain that branch splitting will be necessary for large programs. What is required is a link-time algorithm which assigns addresses with the goal of minimizing the number of split branches generated during program relocation. This disclosure describes an efficient, greedy heuristic approach to such an algorithm, as follows.

Greedy Algorithm

Let G=(V,E) be the call graph associated with the input program. Each call site is represented by an entry in E. Let C be the set of connected components of G Let M be an ordered list of nodes from G
for (each c C)

{

Let L be the list of nodes l1, l2, ..., ln in V, sorted by indegree

i.e. indegree(li-1) [ indegree(li) [ indegree(li+1)

for (each li L)

{

if (li "M)

insert(M, li);

for (each call edge, p, incident upon li)

{

1

Pag...