Browse Prior Art Database

Method for Aggressive Function Cloning and Embedding with Global Code Reordering

IP.com Disclosure Number: IPCOM000033308D
Original Publication Date: 2004-Dec-06
Included in the Prior Art Database: 2004-Dec-06
Document File: 7 page(s) / 81K

Publishing Venue

IBM

Abstract

Given a program code along with profiling information about the frequency of its instructions, gathered with some representative workload, the proposed method produces optimized code with improved performance.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 18% of the total text.

Page 1 of 7

Method for Aggressive Function Cloning and Embedding with Global Code Reordering

Given a program code along with profiling information about the frequency of its instructions, gathered with some representative workload, the proposed method produces optimized code with improved performance.

    Code reordering and function cloning are commonly used today for improving memory locality and eliminating memory accesses. In this paper a new profile based optimization method of an aggressive function cloning and embedding is presented, based on a study of the mutual effects between code reordering and embedded cloned functions. The proposed optimization clones frequently executed large functions and replaces their function calls by their bodies, while removing corresponding call and return instructions at the call sites. The cloning and embedding step is then followed by global code reordering to achieve better grouping of frequently executed code across function calls. The aggressiveness of the function cloning embedding relies on the fact that no constrains are applied on the size or the number of the functions being cloned and embedded. Only execution frequency is taken into consideration during the cloning process. The global code reordering that follows the function cloning guarantees the separation between cold and hot segments of the embedded cloned functions.

    Other solutions such as code reordering [3, 6, 7, 9, 10, 17, 19] and function cloning [8, 24-26] are known optimizations for improving code locality and I-cache behavior. Function inlining [1, 2, 4, 8, 11, 20-23] is another optimization that is heavily used by compilers today. Discussed here are the performance effects of cloning large frequently executed functions which are embedded at their calling sites, the way this is done today at the initial step of function inlining optimization. Thus, embedded cloned functions in this work can be considered as simple cases of inlined functions, but without applying any register allocation or code scheduling phases on them, therefore, avoiding the long compilation time associated with aggressive function inlining.

    However, although commonly used today, function inlining and function cloning can cause code bloat and degraded performance, forcing compilers to limit their scope to relatively small functions. Chen et al. [5] and McFarling [13] examined the effect of function inlining on cache behavior. They found that overall, function inlining seems to be largely ineffective on an optimized layout of basic blocks because the code expansion caused by inlining increases cache conflicts.

    As mentioned above, although function inlining and function cloning can help improve performance, they can sometimes cause a substantial code bloat which results in a higher rate of I-cache misses and performance loss. As a result, inlining and cloning are often applied only to small functions.

Another primary issue of function cloning and embedding or functio...