InnovationQ will be updated on Sunday, Oct. 22, from 10am ET - noon. You may experience brief service interruptions during that time.
Browse Prior Art Database

Intermediate Form Representation of Aggregates with Precise Aliasing

IP.com Disclosure Number: IPCOM000243501D
Publication Date: 2015-Sep-25
Document File: 6 page(s) / 66K

Publishing Venue

The IP.com Prior Art Database


Disclosed is a method to equivalently represent shared and non-shared aggregates in an intermediate form while enabling the encoding of precise aliasing information. The method enables correct aliasing information to be encoded for unions in llvm-ir, and covers both structs and unions when translating from w-code.

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 33% of the total text.

Page 01 of 6

Intermediate Form Representation of Aggregates with Precise Aliasing

Not all intermediate forms used by programming language compilers have a first-class representation of shared-storage aggregates, such as C-language unions (e.g., llvm-ir). Other intermediate forms do not sharply distinguish between non-shared storage forms (such as C-language structs) and shared storage forms.

A need for a new approach became acute while writing a translator from w-code to llvm-ir. Such a translator is needed in order to enable exploitation of high-performance hardware accelerators. In doing this, it was discovered that w-code does not provide a clear distinction between code representing C-unions vs. C-structs, while at the same time llvm does not provide unions as a first-class type. If one uses only struct-like aggregates in llvm-ir, the aliasing patterns are relatively easy to analyze, as each distinct region of memory within the struct is given a unique index. However, when one uses unions, the assumption is generally that any access could alias with any other access within the union. In fact, that is often an unnecessary over-generalization: unions are often used in cases when parts of the union are distinctly accessed from other parts . A simple example is a Union that has a word for a header, which contains a tag that indicates how the rest of the union is to be accessed. The over-generalization of access aliasing has compounded negative effects when translating from w-code, as one is forced to make some worst-case assumptions and treat structs as unions.

Existing approaches include cross-compilers.

A method is needed to equivalently represent shared and non-shared aggregates in an intermediate form while enabling the encoding of precise aliasing information.

The novel solution is to represent both structs and unions in a uniform way when necessary (e.g., when translating from w-code) and to analyze and represent aliasing information in the intermediate form, in this specific case, llvm-ir. The uniform method represents structs and unions as an array of bytes regardless of the underlying types. The array of bytes is accessed with byte indices, combined with casting to the right type before the actual access occurs. The aliasing analysis consists of a careful cataloging of regions of access in the aggregate data structure, which is then encoded using the llvm-ir noalias and alias.scope metadata constructs.

The first step in the new process is to provide a uniform representation of structs and unions as arrays of bytes with access mediated by appropriate casting where necessary. For example, if a union element is a short, then the method is to access the location as a byte address, but then cast it to a short address (in llvm, this is type i16*) before using it for a load or store. Consider


Page 02 of 6

this C program:

union u {

char c[4];

int i;

struct {

short hi;

short lo;

} myunion; struct s { char c;

int i;
} mystruct; int main() {...