Browse Prior Art Database

An Algorithm for Selecting Compiler Flags to Optimize Application Performance

IP.com Disclosure Number: IPCOM000042277D
Original Publication Date: 2005-Feb-03
Included in the Prior Art Database: 2005-Feb-03
Document File: 3 page(s) / 68K

Publishing Venue

IBM

Abstract

An algorithm is presented that provides a structured method for selecting an optimal set of compiler flags which yields maximum application performance, given a candidate set of compiler flags.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 52% of the total text.

Page 1 of 3

An Algorithm for Selecting Compiler Flags to Optimize Application Performance

The algorithm requires the application to be built with a given set of compiler flags, a well defined workload/benchmark to be used to measure/establish performance and a mechanism to determine the correctness of the application's results.

The general process is outlined as follows:

Step 1. Initialization Step.

Define compiler to use

     Supply a list of compiler flags to consider for optimization and any associated linkage options
Step 2. Establish Baseline Performance.

Build application,

     Run benchmark, measure performance
Check pass/fail criteria. If failure, terminate
Step 3. Assess performance for each compiler flag + baseline flags

Loop over each compiler flag, and for each baseline set of flags + each additional flag under consideration, perform:

Build executable

Run benchmark to determine performance
Compute a "Sensitivity Coefficient" (sc) which is defined to be the ratio: Baseline time / ( Flag + Baseline time)

Eliminate all flags from further consideration that have a Sensitivity Coefficient < 1. This in effect removes all flags that do not show any performance improvement from an further consideration.

Assess pass/fail criteria, if failure is identified eliminate the flag (not baseline set of flags) from further consideration.

Step 4. Sort results to identify single flag + baseline that produces the best performance

Re-set the baseline to be the flag + baseline that produces the greatest performance as the new baseline and go to Step 3.

If none of the flag + baseline pairs result in better performance, terminate the process and go to Step 5.

Step 5. Analysis completed wrap-up

Summarize results

The above algorithm had been implemented in a PEARL script. The following is a sample execution of the script.

Text in RED below is used to map the script output to the above algorithm.

=====================================
*** Optimize Compiler Flags (OCF) ***
=====================================

InputFile.................. Input.txt
Compiler................... xlc

1

Page 2 of 3

Baseline Flags............. -O3

Input File Step 1. Initialization

i=0, compiler= xlc, base_ccflags= -O3, base_ldflags=

i ccflags ldflags

1 -qinline

2 -qipa -qipa

3 -qtune=pwr4

4 -qarch=pwr4

5 -qmaxmem=-1

6 -qcache=pwr4

7 -qhot

8 -qinlglue

9 -qunroll

**** Iteration = 1 (Raw Data - Not Sorted) ****
iter i bm sc +/- pf compiler & flags 1 0 9.716 0.000 (b) p xlc -O3 (baseline) Step 2.

1 1 9.736 0.998 (-) p xlc -O3 -qinline Step 3.

1 2 8.946 0.000 (-) f xlc -O3 -qipa .

1 3 8.918 1.089 (+) p xlc -O3 -qtune=pwr4 .

1 4 9.766 0.995 (-) p xlc -O3 -qarch=pwr4 .

1 5 8.124 1.196 (+) p xlc -O3 -qmaxmem=-1 .

1 6 9.338 0.000 (-) f xlc -O3 -qcache=pwr4 .

1 7 8.704 0.000 (-) f xlc -O3 -qhot .

1 8 9.144 1.063 (+) p xlc -O3 -qinlglue .

1 9 9.503 1.022 (+) p xlc -O3 -qunroll Step 3.

**** Iteration = 1 (Sorted) ****
iter i bm sc +/- pf compiler & flags Step 4. 1 0 9.716 0.000 (b) p xlc -O3 (baseline)

1 1 8.124 1...