The Cetus Compiler Manual

Cetus Team

Purdue University
ParaMount Research Group

<cetus@ecn.purdue.edu>

Abstract

Cetus is a source-to-source parallelizing compiler for ISO/ANSI C. It can run on any system supporting Oracle's Java Runtime Environment, Standard Edition 6 or later. If you are referring to a printed version of this document, the most recent version can be found at http://cetus.ecn.purdue.edu/. You can view this manual as a single HTML file here.

Part I. Overview

Table of Contents

1. Introduction

2. Background

GCC
Polaris
SUIF

3. License

Chapter 1. Introduction

Parallelizing compiler technology is most mature for the Fortran 77 language. The simplicity of the language without pointers or user-defined types makes it easy to analyze and to develop many advanced compiler passes. By contrast, parallelization technology for modern languages, such as Java, C++, or even C, is still in its infancy. When trying to engage in such research, we were faced with a serious challenge. We were unable to find a parallelizing compiler infrastructure that supports interprocedural analysis, exhibits state-of-the-art software engineering techniques to achieve shorter development time, and which would allow us to compile large, realistic applications. However, we feel these properties are of paramount importance. They enable a compiler writer to develop "production strength" passes. Production strength passes, in turn, can work in the context of the most up-to-date compiler technology and lead to compiler research that can be evaluated with full suites of realistic applications. Many people have observed and criticized the lack of such thorough evaluations in current research papers. The availability of an easy-to-use compiler infrastructure would help improve this situation significantly. Hence, continuous research and development in this area are among the most important tasks of the compiler community. Cetus contributes to this effort.

Chapter 2. Background

Table of Contents

GCC
Polaris
SUIF

From a substantial list of compiler infrastructures, we choose to discuss three open-source projects that most closely match our goals. The goals are to create a source-to-source infrastructure that supports C and is extensible to other languages. The three projects are the GNU, Polaris, and SUIF compilers. We explain our reasons for not using these infrastructures as our basis, and also discuss important features of these compilers that we want to adopt in Cetus.

GCC

GCC is one of the most robust compiler infrastructures available to the research community. GCC generates highly-optimized code for a variety of architectures, which rivals in many cases the quality generated by the machine vendor's compiler. Its open-source distribution and continuous updates make it attractive. However, GCC was not designed for source-to-source transformations. Most of its passes operate on the lower-level RTL representation. Only recent versions of GCC (version 3.0 onward) include an actual syntax tree representation, which has been used in Purdue class projects for implementing a number of compiler passes. Other limitations are GCC compiles one source file at a time, performs separate analysis of procedures, and requires extensive modification to support interprocedural analysis across multiple files.

The most difficult problem faced by the students was that GCC does not provide a friendly API for pass writers. The API consists largely of macros. Passes need to be written in C and operations lack logical grouping (classes, namespaces, etc), as would be expected from a compiler developed in an object-oriented language.

GCC's IR has an ad-hoc type system, which is not reflected in its implementation language (C). The type system is encoded into integers that must be decoded and manipulated by applying a series of macros. It is difficult to determine the purpose of fields in the IR from looking at the source code, since in general every field is represented by the same type. This also makes it difficult for debuggers to provide meaningful information to the user.

Documentation for GCC is abundant. The difficulty is that the sheer amount easily overwhelms the user. Generally, we have found that there is a very steep learning curve in modifying GCC, with a big time investment to implement even trivial transformations.

The above difficulties were considered primarily responsible for the students that used GCC proceeding more slowly than those creating a new compiler design. The demonstrated higher efficiency of implementation was the ultimate reason for the decision to pursue the full design of Cetus.

Polaris

The Polaris compiler, which we have co-developed in prior work, was an important influence on the design of our new infrastructure. Polaris is written in C++ and operates on Fortran 77 programs. So far, no extensions have been made to handle Fortran 90, which provides a user-defined type system and other modern programming language features. Polaris' IR is Fortran-oriented and extending it to other languages would require substantial modification.

Another factor we considered was that Polaris was written before the Standard Template Library (C++ STL) became available, so it includes its own container implementations. It uses a pre-ISO dialect of C++ which now seems unusual to programmers and causes many warnings (and sometimes errors) with current compilers. Both aspects limit its portability to other platforms.

In general, Polaris is representative of compilers that are designed for one particular language, serve their purpose well, but are difficult to extend. Cetus should not be thought of as "Polaris for C" because it is designed to avoid that problem. However, there are still several Polaris features that we wanted to adopt in Cetus. Polaris' IR can be printed in the form of code that is similar to the source program. This property makes it easy for a user to review and understand the steps involved in Polaris-generated transformations. Also, Polaris' API is such that the IR is in a consistent state after each call. Common mistakes that pass writers make can be avoided in this way.

SUIF

The SUIF compiler is part of the National Compiler Infrastructure (NCI), along with Zephyr, whose design began almost a decade ago. The infrastructure was intended as a general compiler framework for multiple languages. It is written in C++, like Polaris, and the currently available version supports analysis of C programs. SUIF 1 is a parallelizing compiler and SUIF 2 performs interprocedural analysis.

Both SUIF and Cetus fall into the category of extensible source-to-source compilers, so at first SUIF looked like the natural choice for our infrastructure. Three main reasons eliminated our pursuit of this option. The first was the perception that the project is no longer active - the last major release was in 2001 and does not appear to have been updated recently. The second reason was, although SUIF intends to support multiple languages, we could not find complete front ends other than for C and an old version of Java. Work began on front ends for Fortran and C++, but they are not available in the current release. Hence, as is, SUIF essentially supports a single language, C. Finally, we had a strong preference for using Java as the compiler implementation language. Java offers several features conducive to good software engineering. It provides good debugging support, high portability, garbage collection (contributing to the ease of writing passes), and its own automatic documentation system. These facts prompted us to pursue other compiler infrastructures.

Chapter 3. License

In recent years, there has been an explosion in the number of open-source licenses, and we do not wish to contribute further to that problem, but this was the only license upon which all involved parties were able to agree. We suggest that you do not use this license for projects that do not involve Cetus.

The Cetus License is a modified Artistic License, similar to the Perl Artistic License. Artistic Licenses are generally OSI approved, although this specific license has not been submitted to OSI. You can view the license on the download page here.

Part II. Users Guide

Table of Contents

4. Obtaining Cetus

5. Building Cetus

6. Running Cetus

Quick Start
Command Line Options

7. Optimization and Analysis Passes

Data Dependence Analysis
Induction Variable Recognition and Substitution
Reduction Variable Recognition
Privatization
Points-to Analysis
Alias Analysis
Range Analysis
OpenMP

Chapter 4. Obtaining Cetus

Cetus is a set of intermediate representation (IR) classes and optimization passes. It currently has a built-in C parser written in Antlr. Thus, there are two items you need to obtain for full functionality.

The intermediate representation and compiler passes. These are written in Java.
ANTLR v2

The IR and passes are available at the Cetus Project page. This part runs on any system supporting a Java run-time environment, and is covered by the above license. It is available now. The contents of the released version are as follows. More details can be found in the 'readme' file included with the release.

src - Source code for the Intermediate Representation (IR), optimization and analysis passes.
lib - Archived classes (jar)
api - JAVA documentation

Chapter 5. Building Cetus

The following tools are useful to compile Cetus. You do not need them to use Cetus if you have a Cetus JAR file, although it is normally more useful for compiler writers to modify Cetus.

A Java compiler. Oracles's Java compiler, version 1.6.0 or later, is recommended, because that is what we use.
ANTLR v2

The organization of the directories has changed since the last release and there are several options for building Cetus.

For Apache Ant users: The provided build.xml defines the build targets for Cetus. The available targets are "compile", "jar", "clean" and "javadoc". Users need to edit the location of the Antlr tool.
For Linux/Unix command line users: Run the script build.sh after defining system-dependent variables in the script.
For SDK (Eclipse, Netbeans, etc) users: Follow the instructions of each SDK.

Chapter 6. Running Cetus

Table of Contents

Quick Start
Command Line Options

Quick Start

Locate JAR files for ANTLR and Cetus and add them to your CLASSPATH variable.
$ export CLASSPATH=$ANTLR_jar_location:$Cetus_jar_location
Run Cetus by invoking the driver's main method.
$ java cetus.exec.Driver [option] [files]

NOTE: The build script (build.sh) or the Ant's buildfiles (build.xml) provides a target that encapsulates the above processes to a wrapper script called "cetus". See more details in build.sh and build.xml. Cetus can simply be run as follows using the script.

$ cetus [option] [files]

Sample invocations of Cetus: Parallelize outermost loops in the input source and insert timing instrumentation around the parallelized loops. This command automatically generates, by default, OpenMP pragmas for the auto-parallellized loops, and adds instrumentation code for the users' convenience. Output source file is stored in directory "cetus_output" by default.

$ cetus -parallelize-loops=2 -profile-loops=4 test.c

Command Line Options

--------------------------------------------------------------------------------
UTILITY
--------------------------------------------------------------------------------
-debug_parser_input
    Print a single preprocessed input file before sending to parser and exit

-debug_preprocessor_input
    Print a single pre-annotated input file before sending to preprocessor and exit

-dump-options
    Create file options.cetus with default options

-dump-system-options
    Create system wide file options.cetus with default options

-expand-all-header
    Expand all header file #includes into code

-expand-user-header
    Expand user (non-standard) header file #includes into code

-help
    Print this message

-load-options
    Load options from file options.cetus

-macro
    Sets macros for the specified names with comma-separated list (no space is allowed). e.g., -macro=ARCH=i686,OS=linux

-outdir=dirname
    Set the output directory name (default is cetus_output)

-parser=parsername
    Name of parser to be used for parsing source file

-preprocessor=command
    Set the preprocessor command to use

-preserve-KR-function
    Preserves K&R-style function declaration

-skip-procedures=proc1,proc2,...
    Causes all passes that observe this flag to skip the listed procedures

-verbosity=N
    Degree of status messages (0-4) that you wish to see (default is 0)

-version
    Print the version information

--------------------------------------------------------------------------------
ANALYSIS
--------------------------------------------------------------------------------
-alias=N
    Specify level of alias analysis
      =0 disable alias analysis (assume no alias)
      =1 advanced interprocedural analysis (default)
         Uses interprocedural points-to analysis
      =2 assume no alias when points-to analysis is too conservative

-callgraph
    Print the static call graph to stdout

-ddt=N
    Perform Data Dependence Testing
      =1 banerjee-wolfe test (default)
      =2 range test

-parallelize-loops
    Annotate loops with Parallelization decisions
      =1 parallelizes outermost loops (default)
      =2 parallelizes every loop
      =3 parallelizes outermost loops with report
      =4 parallelizes every loop with report

-privatize
    Perform scalar/array privatization analysis

-range=N
    Specifies the accuracy of symbolic analysis with value ranges
      =0 disable range computation (minimal symbolic analysis)
      =1 enable local range computation (default)
      =2 enable inter-procedural computation (experimental)

-reduction=N
    Perform reduction variable analysis
      =1 enable only scalar reduction analysis (default)
      =2 enable array reduction analysis and transformation

--------------------------------------------------------------------------------
TRANSFORM
--------------------------------------------------------------------------------
-induction
    Perform induction variable substitution

-normalize-loops
    Normalize for loops so they begin at 0 and have a step of 1

-normalize-return-stmt
    Normalize return statements for all procedures

-profile-loops=N
    Inserts loop-profiling calls
      =1 every loop          =2 outermost loop
      =3 every omp parallel  =4 outermost omp parallel
      =5 every omp for       =6 outermost omp for

-teliminate-branch=N
    Eliminates unreachable branch targets
      =0 disable (default)
      =1 enable
      =2 leave old statements as comments

-tinline=mode=0|1|2|3|4:depth=0|1:pragma=0|1:debug=0|1:foronly=0|1:complement=0|1:functions=foo,bar,...
    (Experimental) Perform simple subroutine inline expansion tranformation
   mode
      =0 inline inside main function (default)
      =1 inline inside selected functions provided in the "functions" sub-option
      =2 inline selected functions provided in the "functions" sub-option, when invoked
      =3 inline according to the "inlinein" pragmas
      =4 inline according to both "inlinein" and "inline" pragmas
   depth
      =0 perform inlining recursively i.e. within callees (and their callees) as well (default)
      =1 perform 1-level inlining 
   pragma
      =0 do not honor "noinlinein" and "noinline" pragmas
      =1 honor "noinlinein" and "noinline" pragmas (default)
   debug
      =0 remove inlined (and other) functions if they are no longer executed (default)
      =1 do not remove the inlined (and other) functions even if they are no longer executed
   foronly
      =0 try to inline all function calls depending on other options (default)
      =1 try to inline function calls inside for loops only 
   complement
      =0 consider the functions provided in the command line with "functions" sub-option (default)
      =1 consider all functions except the ones provided in the command line with "functions" sub-option
   functions
      =[comma-separated list] consider the provided functions. 
      (Note 1: This sub-option is meaningful for modes 1 and 2 only) 
      (Note 2: It is used with "complement" sub-option to determine which functions should be considered.)

-tsingle-call
    Transform all statements so they contain at most one function call

-tsingle-declarator
    Transform all variable declarations so they contain at most one declarator

-tsingle-return
    Transform all procedures so they have a single return statement

--------------------------------------------------------------------------------
CODEGEN
--------------------------------------------------------------------------------
-ompGen=N
    Generate OpenMP pragma
      =1 comment out existing OpenMP pragmas (default)
      =2 remove existing OpenMP pragmas
      =3 remove existing OpenMP and Cetus pragmas
      =4 keep all pragmas

-profitable-omp=N
    Inserts runtime for selecting profitable omp parallel region (See the API documentation for more details)
      =0 disable
      =1 Model-based loop selection (default)
      =2 Profile-based loop selection

Chapter 7. Optimization and Analysis Passes

Table of Contents

Data Dependence Analysis
Induction Variable Recognition and Substitution
Reduction Variable Recognition
Privatization
Points-to Analysis
Alias Analysis
Range Analysis
OpenMP

Data Dependence Analysis

Data dependence analysis is a memory disambiguation technique through which a compiler tries to establish dependence relations between scalar variables or between array accesses in a program. The existence of dependences or of independence, provides essential information for further analysis and legal transformations such as loop parallelization, loop tiling, loop interchange and so on.

Cetus implements an array data dependence testing framework that includes loop-based affine array-subscript testing. The focus of this implementation is towards automatic loop parallelization. This framework acts as a wrapper around the conventional data-dependence test; we currently use the Banerjee-Wolfe inequalities (Optimizing Supercompilers for Supercomputers, Michael Wolfe) as our default dependence test. The wrap-around framework provides Cetus with the flexibility of including other dependence tests, thereby improving the scope of our implementation; (we provide the Range test since v1.1). A whole program data-dependence graph provides the necessary output information and the interface, to enable subsequent analyses and transformations.

Initially, the algorithm traverses all loops in the program IR and each loop and its inner nest are tested for eligibility. The eligibility check allows us to broaden/narrow the scope of the implementation. Currently, the algorithm includes the following checks:

Canonical loops (Fortran DO loop format): This implements a canonical check for the loop initial assignment expression, the loop condition and the loop index increment statement
Function calls are supported with simple analysis. Known function calls without side-effects can be added to a list of parallelizable calls (e.g. system calls). We also support simple interprocedural side-effect analysis
Control flow modifiers are not allowed (break statements can be handled specially for parallelization)
The loop increment must be an integer constant (symbolic increments are handled using range analysis)

Loop information for all eligible loops is collected and stored internally for further dependence analysis.

SYMBOLIC ANALYSIS SUPPORT: The Data dependence framework has added substantial support for using range analysis in the case of symbolic values. Range information is used to determine loop bounds and loop increments during eligibility testing. In the case of symbolic bounds, range information is used conservatively to test the maximum possible range of values. Symbolic information is also used to simplify array subscripts before they are provided to the Banerjee-Wolfe inequalities for testing. This symbolic support thus adds great value to dependence analysis, especially to the Banerjee test.

The algorithm then tests all array accesses within an entire loop nest for independence. Arrays identified by AliasAnalysis as aliased are currently assumed to be dependent in all directions for the enclosing loops. Pairs of array accesses such that at least one of them is a write access are provided as input to the dependence test which returns a set of direction vectors corresponding to the common enclosing loop nest for the pair of accesses (if independence could not be proved). These direction vectors are then used to build a Data Dependence Graph (DDG) for the loop nest.

The data dependence graph for each loop nest becomes a part of a larger whole program based graph which is attached to the Program IR object upon completion of dependence testing. This implementation is new, and substantially different from release v1.0. For a user looking to use dependence information in an analysis or transformation pass, a reference to this DDG must be obtained from the Program object. A comprehensive API is then available via DDGraph (see Cetus javadoc) to extract dependence information related to loops. This implementation has the advantage that dependence testing is run fewer times and a common API is available to pass users. NOTE: However, it is the pass writer's responsibility that after a transformation on the IR has been performed, dependence testing must be re-run in order to create an up-to-date version of the dependence graph which automatically replaces itself in the Program IR.

Induction Variable Recognition and Substitution

Induction variable (IV) substitution pass recognizes and substitutes induction variables in loops that take the form of iv = iv + expr. This form of assignment prevents the automatic parallelizer from performing analysis and code generation, due to the data dependence on the assignment operation. Our IV pass detects such induction variables and replaces them with equivalent expressions in terms of relevant loop index variables. The scope of the IV pass is up to detection and substitution of Generalized Inudction Variables (GIV) with additive induction operations; it allows multiple induction operations on a variable in a multiply nested loop, and the increment expression should not contain any loop-variant variables other than other induction variables. The following example shows an input program and a transformed code with our IV pass.

    iv = 1;                         
    for (i=0; i<10; i++) {          iv = 1;
      iv = iv+1;                    for (i=0; i<10; i++) {
      ... = iv;                       ... = 2+41*i;
      for (j=0; j<20; j++) {   -->    for (j=0; j<20; j++) {
        iv += 2;                        ... = 4+41*i+2*j;
        ... = iv;                     }
      }                             }
    }

Reduction Variable Recognition

Reduction pass performs reduction recognition for each ForLoop. It generates cetus annotation in the form of "#pragma cetus reduction(...)" Currently, it supports scalar (sum += ...), ArrayAccess (A[i] += ...), and AccessExpression (A->x += ...) for reduction variable. If another pass wants to access reduction information for a given statement, stmt, Tools.getAnnotation(stmt, "cetus", "reduction") will return an object that contains a reduction map.

Privatization

Array Privatization performs privatization analysis of the program. It tries to find privatizable variables (scalars and arrays) which are written first then read in a loop body. The high-level algorithm in below describes the process of detecting privatizable variables, both scalars and array sections, in a loop. The set operations that appear in the algorithm are performed on the array sections if the variable is an array. We use the power of symbolic analysis techniques in Cetus to make the symbolic section operation possible. For example, [1:m] (intersect) [1:n] results in [1:n] if the expression comparison tool with the value range set can decide n is less than or equal to m.

The algorithm traverses a loop nest from the innermost to the outermost loop. At each level, it first collects definitions (write references) and uses (read references) in the loop body. Uses that are covered by prior definitions create privatizable variables (or array sections) for the current loop. The other uses are upward exposed, creating read references seen in the outer loop. Second, the algorithm aggregates all these array sections over the loop iteration space, creating the array sections that are private, written and upward exposed for the entire loop. The aggregation for the written sections (DEF) computes the must-defined regular sections of the arrays over the entire loop iteration while the aggregation of upward-exposed sections (UEU) requires only the conservative value ranges of the sections (may-used sections). This algorithm is a slightly simpler version of the one used in the Polaris parallelizing compiler for Fortran77 programs.

 procedure Privatization(L)

   Input : Loop L
   Output: DEF[L], UEU[L], PRI[L]
   // DEF: Definded set
   // USE: Used set
   // UEU: Upward-exposed set
   // PRI: Private variables
   // (^): intersection, (v): union

   foreach direct inner loop L' in L
     (DEF[L'],USE[L']) = Privatization(L')

   G(N,E) = BuildCFG(L)
   // BuildCFG builds a CFG with the inner loops represented as super nodes

   Iteratively solve data flow equation DEF for node n in N
     DEFin[n] = (^) {DEFout[p], p in predecessors of n}
     DEFout[n] = (DEFin[n]-KILL[n]) (v) DEF[n]

   DEF[L] = {}
   UEU[L] = {}
   PRI[L] = CollectCandidates(L)
   // CollectCandidates collects arrays with loop-invariant subscripts

   foreach loop-exiting node e in N
     DEF[L] = DEF[L] (^) DEFout[e]

   foreach node n in N
     UEU[L] = UEU[L] (v) (USE[n]-DEFin[n])

   foreach variable v in UEU[L]
     PRI[L] = PRI[L]-{v}

   DEF[L] = AggregateDEF(DEF[L])
   UEU[L] = AggregateUSE(UEU[L])
   // AggregateDEF aggregates array sections in DEF (MUST set)
   // AggregateUSE aggregates array sections in USE (MAY set)

   return (DEF[L], UEU[L])

 end

Array privatization is invoked by specifying the flag -privatize, and the result is stored as an annotation that contains the set of private variables. We do not consider a variable with user-defined type as a candidate private variable, but intend to widen the scope of the analysis in the future.

Points-to Analysis

Points-to analysis is a compile-time technique that helps identify relationships between pointer variables and the memory locations that they point to during program execution. Pointers are powerful programming constructs and allow complex memory manipulation during program execution through techniques such as pointer arithmetic and dynamic memory allocation. Pointer relationships can thus be complex and difficult to analyze at compile time. On the other hand, however, they provide the benefit of simplifying various other compile time analyses such as constant propagation, alias analysis in C programs that allow extensive use of pointers. Cetus provides an advanced interprocedural points-to analysis framework, and this section describes the intraprocedural design of this framework. The goal of the points-to analyzer is to identify the set of named or unnamed memory locations that a pointer variable may point to.

The Cetus points-to analyzer is based on the design described in: "A practical interprocedural alias analysis for an optimizing/parallelizing C Compiler", M. Emami, Masters Thesis, McGill University, 1993.

A flow-sensitive approach is used for analyzing intraprocedural pointer information. Within a procedure, pointer relationships are created through explicit pointer assignment. Updates are also possible through function calls. At the end of the iterative pointer information update, once a fixed point is reached, the analyzer provides a map as output. This maps every statement in the procedure to the points-to information that exists before it, which will be made clear through examples shown below. Thus, accurate pointer information is made available at every point in the program. The output map is used by the interprocedural framework, as described in Section (?).

The following sections provide an overview of the intraprocedural implementation and describe the results through simple examples.

1) Points-to Representation: The analyzer uses the Cetus Domain representation to handle points-to information. Special data structures PointsToDomain and PointsToRel are used to implement points-to relationships. These data structures use Cetus Symbol information to uniquely identify memory locations. Source level symbol information is available through the Symbol interface for named locations. For unnamed locations such as heap allocated memory, string literal locations, and other special locations, the analyzer uses a special Symbol called AbstractLocation. The relationship also contains a boolean value identifying whether the pointer relationship is definitely(D) or only possibly(P) valid at that program point. Arrays are handled by the analyzer conservatively, an array is considered as a single memory location and every pointer pointing to or pointing into an array is said to possibly(P) point to the single location array. This helps in simplification of the analysis. Aggregate structures such as structs are handled precisely as a combination of scalars, arrays and other aggregate structures. We use a newly modified AccessSymbol to handle structs.

The PointsToDomain represents a set of pointer relationships at every program point. The analyzer uses NullDomain (already available in Cetus) as a starting point for the analyzer and continues to build relationships through forward dataflow analysis. The UniverseDomain is provided as a conservative fall-back domain representation in case the analyzer is unable to infer accurate information with regards to the pointer relationships. The Universe is meant to represent the situation where all pointers can be assumed to be pointing to all memory locations in the program universe.

int main(void)
{
        struct {
                int *f1;
        }x, *z;

        int y[70], w;

        /* [] */
        z = &x;
        /* [(z,x,D)] */
        (*z).f1 = &w;
        /* [(<x.f1>,w,D), (z,x,D)] */
        x.f1 = &y[0];
        /* [(<x.f1>,y,P), (z,x,D)] */
}

2) Control Flow-Based Analysis: Cetus implements a work-list based iterative data flow analyzer. The intraprocedural driver obtains a control flow graph (see CFGraph) for the procedure, this graph is statement based. Control flow constructs such as loops, if-else statements, switch statements etc., are handled through iterative traversal of the nodes of the control flow graph. Points-to information is merged using PointsToDomain functionality, as shown in the examples below. Here, relationships change from definite(D) to possible(P) following a merge from different paths in the program.

int main(void)
{
        int a, b, c, i;
        int *p1, *p2, **p3;

        /* [] */
        p1 = &a;
        /* [(p1,a,D)] */
        p3 = &p1;
        /* [(p1,a,D), (p3,p1,D)] */
        p2 = *p3;
        /* [(p1,a,D), (p2,a,D), (p3,p1,D)] */
        if (i > 0)
                /* [(p1,a,D), (p2,a,D), (p3,p1,D)] */
                p1 = &b;
        else
                /* [(p1,a,D), (p2,a,D), (p3,p1,D)] */
                p1 = &c;

        /* [(p1,b,P), (p1,c,P), (p2,a,D), (p3,p1,D)] */
        return;
}

3) Interprocedural analysis: The interprocedural analysis of points-to relation was built upon the common interprocedural framework introduced in the current version. The role of the interprocedural analysis is to reflect the effect of entering or returning from a called procedure, by appropriately renaming the variables that appear in the points-to relations. The renaming algorithm is quite similar to one that appears in the reference implementation, but we adjusted the interprocedural driver so that it works on top of our common IPA framework. Currently, the IPA points-to analysis does not support full context sensitivity and gives up analysis of programs containing function calls through function pointers. The following example shows the result of interprocedural points-to analysis on a simple program.

int *x, y;
int main()
{
        int *a;
        a=&y;
            /* [(a,y,D)] */
        x=&y;
            /* [(a,y,D), (x,y,D)] */
        f(a);
            /* [(a,y,D), (x,y,D)] */
        return 0;
}

void f(int *m)
{
            /* [(m,y,D), (x,y,D)] */
        g(m);
            /* [(m,y,D), (x,y,D)] */
        return;
}

void g(int *n)
{
            /* [(n,y,D), (x,y,D)] */
        return;
}

4) Alias Analysis Support: Pointer relationships obtained as a result, can be used to create alias sets for that program point by identifying different pointers, or aliases, pointing to the same memory locations. As described in detail in the section on Alias Analysis, advanced interprocedural pointer information is used to build alias sets for the requested program statement. Refer to the Cetus API documentation for methods to access alias information in user passes.

Alias Analysis

Alias analysis is a technique used to identify sets of program variable names that may refer to the same memory location during program execution. In C programs, aliases are created through the use of pointers as well as reference parameter passing. Disambiguation of variable names can be crucial to the accuracy and the effectiveness of other analyses and transformations performed by the compiler. A compile-time alias analysis would either be flow and/or context sensitive or neither of the two. Cetus provides advanced flow-senstivity and limited context-sensitivity when generating alias information.

Advanced alias information is now provided in Cetus using the result of interprocedural points-to analysis. This information is flow-sensitive and hence, uses statement information. The result of the points-to analyzer is a map from a Statement to points-to information that exists right before the statement is executed. Pointer relationships are used by the Alias Analysis pass to create alias sets, which are obtained through a unification-based approach.

Alias information can be accessed by pass writers using a simple API. The presence of alias can be checked using functions that return a boolean value, after checking for aliasing between symbols. Refer to the HIR documentation for more information on symbols.

Range Analysis

Range Analysis performs symbolic range propagation for the programs by symbolically executing the program and abstracting the values of integer variables with a symbolic bounds. The implementation is based on Symbolic Range Propagation by Blume and Eigenmann, which was implemented for the Fortran77 language.

The goal of Range Analysis is to collect, at each program statement, a map from integer-typed scalar variables to their symbolic value ranges, represented by a symbolic lower bound and an upper bound. In other words, a symbolic value range expresses the relationship between the variables that appear in the range. We use a similar approach as in Symbolic Range Propagation in the Polaris parallelizing compiler, with necessary adjustment for the C language, to compute the set of value ranges before each statement. The set of value ranges at each statement can be used in several ways. Pass writers can directly query the symbolic bound of a variable or can compare two symbolic expressions using the constraints given by the set of value ranges.

The high-level algorithm performs fix-point iteration in two phases when propagating the value ranges throughout the program. The first phase applies widening operations at nodes that have incoming back edges to guarantee the termination of the algorithm. The second phase compensates the loss of information due to the widening operations by applying narrowing operation to the node at which widening has occurred. During the fix-point iteration, the value ranges are merged at nodes that have multiple predecessors, and outgoing value ranges are computed by symbolically executing the statement. Two typical types of program semantics that cause such changes of value ranges are constraints from conditional expressions and assignments to variables.

Range analysis returns a map from each statement to its corresponding that contains the set of valid value ranges before each statement. The result of this analysis was verified with the C benchmarks in the SPEC CPU2006 suite. Range analysis does not handle integer overflow as it does not model the overflow but such cases are rare in normal correct programs. Following example shows how to invoke a range analyzer in a compiler pass:

 import cetus.analysis.*;
 ...
 Map<Statement, RangeDomain> range_map = RangeAnalysis.getRanges(procedure);
 RangeDomain range_domain = range_map.get(statement);
 // range_domain now contains the set of value ranges for the statement.
 range_domain.compare(expr1, expr2);

Following example shows a function and its range map created after the range analysis.

          int foo(int k)
                  []                    
          {                             
                  []                    
            int i, j;                   
                  []                    
            double a;                   
                  []                    
            for ( i=0; i<10; ++i )      
                  []                    
            {                           
                  [0<=i<=9]             
              a=(0.5*i);                
            }                           
                  [i=10]                
            j=(i+k);                    
                  [i=10, j=(i+k)]       
            return j;                   
          }

Since version 1.2, interprocedural range analysis is provided as an option to increase the accuracy of value ranges, which is still in experimental state. The following example shows the effect of this extension on the loop-parallelization pass. For this example, the Range test was used instead of the default Banerjee-Wolfe test.

int main()                                int main()
{                                         { ...
  double a[100];                          }
  foo(a, 0, 10);                          
  return 0;                               void foo(double *a, int lb, int ub)
}                                         {
                                     -->    int i;
void foo(double *a, int lb, int ub)         #pragma cetus private(i)
{                                           #pragma cetus parallel
  int i;                                    #pragma omp parallel for private(i)
  for (i=lb; i<ub; i++) {                   for (i=lb; i<ub; i++) {
    a[i] = a[i+10];                           a[i] = a[i+10];
  }                                         }
}                                         }

OpenMP

Cetus parses OpenMP programs and creates an internal data structure to hold the information in the OpenMP constructs. This internal data structure is stored in Annotation and inserted into IR wrapped inside an AnnotationStatement. Tools.java contains useful utility functions that provide access to the AnnotationStatement in the IR. Current Cetus supports OpenMP 3.0 specification.

Part III. Developers Guide

Table of Contents

8. Architecture of Cetus

Front End

Integrated Parsers
Separate Parsers
Handling Pragmas and Comments

Intermediate Representation

Class Hierarchy Design
Major Classes
Relationship Between Grammar and IR
Syntax Tree Invariants
Annotations

Back End

9. Writing a Pass

Making Cetus Execute Your Pass
Iterators
Tools
Expression Simplifier
Modifying IR
Printing

Chapter 8. Architecture of Cetus

Table of Contents

Front End

Integrated Parsers
Separate Parsers
Handling Pragmas and Comments

Intermediate Representation

Class Hierarchy Design
Major Classes
Relationship Between Grammar and IR
Syntax Tree Invariants
Annotations

Back End

Front End

Integrated Parsers

Cetus is written in Java, so it is natural to use ANTLR to generate parsers whenever possible. Cetus comes with an ANTLR parser for C. We determined that ANTLR cannot be used for C++. We are aware that there is a C++ grammar on the ANTLR website, but it is incomplete and we wanted a grammar that matched the standard grammar in Stroustrup's book as much as possible.

Separate Parsers

Parsing intentionally was separated from the IR-building methods in the high-level interface so that other front ends could be added independently. Some front ends may require more effort than others. For example, writing a parser for C++ is a challenge because its grammar does not fit easily into any of the grammar classes supported by standard generators. The GNU C++ compiler was able to use an LALR(1) grammar, but it looks nothing like the ISO C++ grammar. If any rules must be rearranged to add actions in a particular location, it must be done with extreme care to avoid breaking the grammar. Another problem is C++ has much more complicated rules than C as far as determining which symbols are identifiers versus type names, requiring substantial symbol table maintenance while parsing.

Handling Pragmas and Comments

Pragmas and Comments are identified during scanning as "Annotation"-type IR. These are inserted by the parser into the IR as PreAnnotation(s). Comments are inserted as they appear in the program, except for when they appear in the middle of another IR construct, such as an AssignmentStatement. In this case, they appear in the output before the corresponding statement. For comments that are at the end of code on the same line, they appear AFTER the same line in the output.

Since v1.1, Cetus adopts a new Annotation implementation in order to simplify the IR associated with different types of annotations as we begin to accommodate more types. Once PreAnnotations are parsed in and stored in the IR, the AnnotationParser converts these into specific IR as described in the Annotations section later in this manual. PragmaAnnotations can be associated with specific statements, knowing their semantics, and hence this is done automatically by Cetus thus allowing movement of annotations with corresponding IR during transformations. However, in the case of CommentAnnotations and other possibly new annotations interpreted as comments, Cetus can only store them as stand-alone annotations thus preventing their movement with corresponding IR.

More details about the exact Annotation implementation is found in the IR section.

Intermediate Representation

Cetus' IR contrasts with the Polaris Fortran translator's IR in that it uses a hierarchical statement structure. The Cetus IR directly reflects the block structure of a program. Polaris lists the statements of each procedure in a flat way, with a reference to the outer statement being the only way for determining the block structure. There are also important differences in the representation of expressions, which further reflects the differences between C and Fortran. The Polaris IR includes assignment statements, whereas Cetus represents assignments in the form of expressions. This corresponds to the C language's feature to include assignment side effects in any expression.

The IR is structured such that the original source program can be reproduced, but this is where source-to-source translators face an intrinsic dilemma. Keeping the IR and output similar to the input will make it easy for the user to recognize the transformations applied by the compiler. On the other hand, keeping the IR language independent leads to a simpler compiler architecture, but may make it impossible to reproduce the original source code as output. In Cetus, the concept of statements and expressions are closely related to the syntax of the C language, facilitating easy source-to-source translation. However, the drawback is increased complexity for pass writers (since they must think in terms of C syntax) and limited extensibility of Cetus to additional languages. That problem is mitigated by the provision of several abstract classes, which represent generic control constructs. Generic passes can then be written using the abstract interface, while more language-specific passes can use the derived classes. We feel it is important to work with multiple languages at an early stage, so that our result is not simply a design that is extensible in theory but also in practice. Toward this goal, we have begun adding a C++ front end and generalizing the IR so that we can evaluate these design trade-offs. Other potential front ends are Java and Fortran 90.

Class Hierarchy Design

Our design goal was a simple IR class hierarchy easily understood by users. It should also be easy to maintain, while being rich enough to enable future extension without major modification. The basic building blocks of a program are the translation units, which represent the content of a source file, and procedures, which represent individual functions. Procedures include a list of simple or compound statements, representing the program control flow in a hierarchical way. That is, compound statements, such as IF-constructs and FOR-loops include inner (simple or compound) statements, representing then and else blocks or loop bodies, respectively. Expressions represent the operations being done on variables, including the assignments to variables.

Major Classes

All of the classes are documented in detail with javadoc. The API can be found at http://cetus.ecn.purdue.edu/ in the Documentation section.

A brief discussion of important base classes is below.

Program

A Program object represents the entire program. It is the root of the syntax tree.

TranslationUnit

A TranslationUnit represents a single source file. It may only appear as the immediate child of the Program.

Declarations

Declarations appear in many places. As children of TranslationUnit, they represent global declarations of variables, functions, or classes. As parameters of a Procedure, they represent declarations of formal parameters. As children of ClassDeclaration, they representmethods and data members.

Expressions

Expressions are normally of most interest to optimization passes. All assignments, function calls, and mathematical computations are Expressions.

Statements

Statements serve two purposes: to provide control constructs, and to provide wrappers for Declarations and Expressions. Statements such as IfStatement provide a control construct, whereas DeclarationStatement and ExpressionStatement simply allow Declarations and Expressions to appear wherever a Statement may appear.

Specifiers

Specifiers are most often found in lists. For example, a variable declaration may be prefixed with a list of specifiers such as const and float.

Relationship Between Grammar and IR

When designing any class hierarchy, some general principles are followed. The main principle is that a derived class satisfies an is a relationship with its parent, such that the data and methods of the parent make sense in the context of the child. This principle is followed to some extent in Cetus, but it would be more accurate to say that a derived class can appear in the grammar wherever its parent can appear.

There is a distinction between the class hierarchy and the syntax tree. For example, in the syntax tree, the parent of a TranslationUnit is a Program, however neither TranslationUnit nor Program have a parent in the class hierarchy.

Syntax Tree Invariants

One important aspect that makes an infrastructure useful is providing a good set of tools to help debug future compiler passes. Cetus provides basic debugging support through the Java language, which contains exceptions and assertions as built-in features. Cetus executes within a Java virtual machine, so a full stack trace including source line numbers is available whenever an exception is caught or the compiler terminates abnormally.

Such error reporting is useless unless the IR is designed to prevent programmers from corrupting the program representation. In other words, there must be a way of detecting the state of the IR is not as it should be, in order to report an error. To provide error checking and detect common errors by pass writers, the Cetus IR maintains several invariants. Violating one of the invariants below will probably result in an exception, to the extent that it is possible for Cetus to recognize what you have done.

Invariant

If an object has a parent, then it has exactly one parent.

Consequences

You may not take an object that has a parent and add it as the child of another object.
If you want to use the same object in more than one place in the syntax tree, you must clone the original object.
Clones are identical to the originals except their parent is null.

Invariant

An object P can enumerate all of its children.

Consequences

An iterator object can be created with P that can iterate over P's children. In some cases, the iterator will not visit data that is actually part of P itself. Nearly everything is kept in the list of children, and we may more move data into the list in the future.

Invariant

An object P controls the addition and removal of its children.

Consequences

An object C cannot become the child of an object P without P's permission.
Before an object C can set its parent reference to an object P, P must already recognize C is a child. I.e. C must already be in the list of P's children.
The child reference and parent reference (i.e. downlink and uplink) must be set in that order. The addXYZ methods of many classes will take care of this for you. There are also atomic swap methods that respect the ordering.

Annotations

Cetus extends our initial implementation of Annotations to a completely new IR structure and API. Comments, pragmas and other annotations were initially parsed in as Annotations and enclosed inside DeclarationStatements (in most cases). In the new implementation, parsed in annotations are converted to a new internal representation through the AnnotationParser. The AnnotationParser stores annotations under corresponding subclasses:

PragmaAnnotation
- CetusAnnotation (#pragma cetus ...)
- OmpAnnotation (#pragma omp ...)
CommentAnnotation (e.g. /* ... */)
CodeAnnotation (Raw printing)

Annotations can be attached to existing IR where the semantics define so or can be inserted as stand-alone IR. In the above mentioned subclasses, Cetus and Omp Annotations have specific semantics and hence if present in the input source, they are attached to corresponding source IR. However, this is not possible for Comments as it is difficult to determine their association with the IR except for in certain cases (which are not handled currently). Hence, all Comments and other annotations are inserted as stand-alone.

Annotations that need to be attached to existing IR are stored as member objects of this Annotatable IR (Statements and Declarations), the new API for manipulating these is hence available through Annotatable. Standalone annotations are enclosed in special IR such as AnnotationDeclaration or AnnotationStatement (note that AnnotationStatement is different from previous release). The API in Annotatable *COMPLETELY REPLACES* previous functionality provided through Tools.java.

See the latest Cetus tutorial at http://cetus.ecn.purdue.edu for examples on how to use the new Annotation and Annotatable API during analysis and transformations.

Back End

Cetus does not contain a code generator. It is a source-to-source compiler, and so it relies on other compilers (such as GCC, Intel or Microsoft compiler) to compile the source code it outputs. It is possible that in the future a back end could be added to Cetus, but for research purposes other compilers are more suited to back end optimization work.

Chapter 9. Writing a Pass

Table of Contents

Making Cetus Execute Your Pass
Iterators
Tools
Expression Simplifier
Modifying IR
Printing

The following sections discuss the interface Cetus provides to pass writers.

Making Cetus Execute Your Pass

There are two ways to extend Cetus to run a new pass.

For passes accepted into the main Cetus distribution, provide a static void run method that accepts a Program object as its only parameter. We will add a flag to Cetus, named similarly to your pass, that will cause the Cetus driver to call your pass.
Derive a class from cetus.exec.Driver and override the runPasses method. You must provide your own main method, which should contain a single line:
```
public class MyDriver extends Driver
{
  // ...

  public static void main(String[] args)
  {
    (new MyDriver()).run(args);
  }
}
```
where args is simply the command line argument of your main method. You can optionally override the printUsage and printHelp methods if your pass has additional command-line options. The Driver class contains a protected Program object that your derived class will be able to access in its runPasses method. Note that the run method called in the example above is the run method of the Driver class. It will process command-line arguments, run the parsers, and get everything ready for your code before calling runPasses.

Iterators

The IRIterator class implements the Java Iterator interface plus some added functionality. The BreadthFirstIterator and DepthFirstIterator iterate over the IR in breadth-first and depth-first order, respectively. The FlatIterator simply iterates over an object's children, and does not perform a "deep" traversal within the children. An IRIterator can be made to skip objects such that it only returns objects of certain types. It can also be made to prune parts of the traversal below objects of certain types.

IRIterators provide several versions of next that are explained in the examples below. The first example shows how to use the standard next:

/* BreadthFirst traversal over everything in a procedure. Assumes proc is a Procedure object. */

BreadthFirstIterator iter = new BreadthFirstIterator(proc);
while (iter.hasNext())
{
  Object o = iter.next();
  // Do something with the object
}

The next example shows how to locate a particular type of object:

/* Look for loops in a procedure.  Assumes proc is a Procedure object. */

BreadthFirstIterator iter = new BreadthFirstIterator(proc);
try {
  while (true)
  {
    Loop loop = (Loop)iter.next(Loop.class);
    // Do something with the loop
  }
} catch (NoSuchElementException e) {
}

Note the exception handling. next must throw an exception if it cannot find an element of the specified class, because the corresponding hasNext method cannot be implemented. The reason is the iterator would have to actually perform the iteration to determine if there was such a next element, and if true, would need to perform the iteration again to actually return the object. The standard hasNext does not have have this problem because it simply checks if it has reached the end of the list. We think it would be very strange for users if we provided a hasNext method that modified the iterator, and it would be awkward for us to write a hasNext that would look ahead and then undo its changes. Furthermore, that version of hasNext would no longer be a Θ(1) call, so we chose not to provide it.

Other versions of next will find an element of a set of classes, or an element that is not of a set of classes.

The next example shows how to use pruning:

/* Look for procedures in a program.  Assumes prog is a Program object. */

BreadthFirstIterator iter = new BreadthFirstIterator(prog);
iter.pruneOn(Procedure.class);

try {
  while (true)
  {
    Procedure proc = (Procedure)iter.next(Procedure.class);
    // Do something with the procedure 
  }
} catch (NoSuchElementException e) {
}

Instead of obtaining one iterator on the Program object to look for TranslationUnits, and then obtaining another iterator on each TranslationUnit to look for Procedures, it is easier to do a breadth first search on the entire Program. However, it does not make much sense to look for Procedures inside other Procedures, since none of the languages supported by Cetus allow nested Procedures. Therefore, pruneOn tells the iterator to skip anything below a Procedure on the IR tree.

Tools

The cetus.hir package provides a set of utility methods that are used in common operations such as searching the IR, claiming a new symbol, printing various collections, etc. Since version 1.2, these utilities are provided in five different subsets for better user accessibility. Some of these methods are used only internally, hence are not provided as public methods.

DataFlowTools: Utility methods for detecting used or modified memory accesses
IRTools: Utility methods for searching specific types of IR objects that appear in the IR tree
PrintTools: Utility methods that enable pretty printing of collections and user-directed printing of verbose information
SymbolTools: Utility methods related to Cetus' symbol interface
Tools: Utility methods for general purpose

Expression Simplifier

Expression simplifier supports normalization and simplification of expressions. It internally transforms a given expression to a normalized form so that each operand of a commutative and associative operation is located at the same height in the tree representation; in other words, it converts a binary tree to a n-ary tree. This internal representation makes the simplification algorithm easier and the result of the simplification is converted back to the original internal representation of Cetus.

Like its predecessor Polaris, a key feature of Cetus is the ability to reason about the represented program in symbolic terms. For example, compiler analyses and optimizations at the source level often require the expressions in the program to be in a simplified form. A specific example is data dependence analysis that collects the coefficients of affine subscript expressions, which are passed to the underlying data dependence test package. Cetus has functionalities that ease the manipulation of expressions for pass writers. The following example shows how to invoke the simplifier. The simplifier returns a new copy of the original expression that was converted to a normalized form.

 import cetus.hir.*;
 ...
 Expression e = ...
 e = Symbolic.simplify(e);

It is also possible for users to invoke individual simplification technique for their own purposes. The following examples show the functionality of the individual simplification technique. See the javadoc page or the source code to learn how to invoke each technique individually.

 1+2*a+4-a --> 5+a (folding)
 a*(b+c) --> a*b+a*c (distribution)
 (a*2)/(8*c) --> a/(4*c) (division)

Modifying IR

Modifying the IR is efficiently done through the API provided for the Cetus IR. The API provides complete abstraction to the pass writer, and construct specific API help in making the pass writer's job easier.

More details about modifying the IR will be added to this manual in the near future. We currently suggest that you refer to the IR manipulation information in the Cetus tutorials available at http://cetus.ecn.purdue.edu. Please get in touch with us for further support.

Printing

A major complaint about early versions of Cetus was that the printing of IR was not flexible enough. To solve this problem, we have made printing completely customizable by the pass writer. Nearly all classes implement a Printable interface, which means they provide a print method for printing themselves as source code. By default, this print method calls a static method of the same class, called defaultPrint. The call is made by invoking a Java Method object, which is similar to a function pointer in C. The Method object can be changed by calling setPrintMethod and passing it a different printing routine. setPrintMethod allows a pass writer to change how one particular object prints itself. If the pass writer notices that they are calling this method frequently for the same type of object, then it may be easier to call setClassPrintMethod, a static method which causes all newly created objects of a particular class to use the provided printing routine.