Chapter 2. Background

Table of Contents

GCC
Polaris
SUIF

From a substantial list of compiler infrastructures, we choose to discuss three open-source projects that most closely match our goals. The goals are to create a source-to-source infrastructure that supports C and is extensible to other languages. The three projects are the GNU, Polaris, and SUIF compilers. We explain our reasons for not using these infrastructures as our basis, and also discuss important features of these compilers that we want to adopt in Cetus.

GCC

GCC is one of the most robust compiler infrastructures available to the research community. GCC generates highly-optimized code for a variety of architectures, which rivals in many cases the quality generated by the machine vendor's compiler. Its open-source distribution and continuous updates make it attractive. However, GCC was not designed for source-to-source transformations. Most of its passes operate on the lower-level RTL representation. Only recent versions of GCC (version 3.0 onward) include an actual syntax tree representation, which has been used in Purdue class projects for implementing a number of compiler passes. Other limitations are GCC compiles one source file at a time, performs separate analysis of procedures, and requires extensive modification to support interprocedural analysis across multiple files.

The most difficult problem faced by the students was that GCC does not provide a friendly API for pass writers. The API consists largely of macros. Passes need to be written in C and operations lack logical grouping (classes, namespaces, etc), as would be expected from a compiler developed in an object-oriented language.

GCC's IR has an ad-hoc type system, which is not reflected in its implementation language (C). The type system is encoded into integers that must be decoded and manipulated by applying a series of macros. It is difficult to determine the purpose of fields in the IR from looking at the source code, since in general every field is represented by the same type. This also makes it difficult for debuggers to provide meaningful information to the user.

Documentation for GCC is abundant. The difficulty is that the sheer amount easily overwhelms the user. Generally, we have found that there is a very steep learning curve in modifying GCC, with a big time investment to implement even trivial transformations.

The above difficulties were considered primarily responsible for the students that used GCC proceeding more slowly than those creating a new compiler design. The demonstrated higher efficiency of implementation was the ultimate reason for the decision to pursue the full design of Cetus.