Chapter 8. Architecture of Cetus

Table of Contents

Front End
Integrated Parsers
Separate Parsers
Handling Pragmas and Comments
Intermediate Representation
Class Hierarchy Design
Major Classes
Relationship Between Grammar and IR
Syntax Tree Invariants
Annotations
Back End

Front End

Integrated Parsers

Cetus is written in Java, so it is natural to use ANTLR to generate parsers whenever possible. Cetus comes with an ANTLR parser for C. We determined that ANTLR cannot be used for C++. We are aware that there is a C++ grammar on the ANTLR website, but it is incomplete and we wanted a grammar that matched the standard grammar in Stroustrup's book as much as possible.

Separate Parsers

Parsing intentionally was separated from the IR-building methods in the high-level interface so that other front ends could be added independently. Some front ends may require more effort than others. For example, writing a parser for C++ is a challenge because its grammar does not fit easily into any of the grammar classes supported by standard generators. The GNU C++ compiler was able to use an LALR(1) grammar, but it looks nothing like the ISO C++ grammar. If any rules must be rearranged to add actions in a particular location, it must be done with extreme care to avoid breaking the grammar. Another problem is C++ has much more complicated rules than C as far as determining which symbols are identifiers versus type names, requiring substantial symbol table maintenance while parsing.

Handling Pragmas and Comments

Pragmas and Comments are identified during scanning as "Annotation"-type IR. These are inserted by the parser into the IR as PreAnnotation(s). Comments are inserted as they appear in the program, except for when they appear in the middle of another IR construct, such as an AssignmentStatement. In this case, they appear in the output before the corresponding statement. For comments that are at the end of code on the same line, they appear AFTER the same line in the output.

Since v1.1, Cetus adopts a new Annotation implementation in order to simplify the IR associated with different types of annotations as we begin to accommodate more types. Once PreAnnotations are parsed in and stored in the IR, the AnnotationParser converts these into specific IR as described in the Annotations section later in this manual. PragmaAnnotations can be associated with specific statements, knowing their semantics, and hence this is done automatically by Cetus thus allowing movement of annotations with corresponding IR during transformations. However, in the case of CommentAnnotations and other possibly new annotations interpreted as comments, Cetus can only store them as stand-alone annotations thus preventing their movement with corresponding IR.

More details about the exact Annotation implementation is found in the IR section.