Points-to Analysis

Points-to analysis is a compile-time technique that helps identify relationships between pointer variables and the memory locations that they point to during program execution. Pointers are powerful programming constructs and allow complex memory manipulation during program execution through techniques such as pointer arithmetic and dynamic memory allocation. Pointer relationships can thus be complex and difficult to analyze at compile time. On the other hand, however, they provide the benefit of simplifying various other compile time analyses such as constant propagation, alias analysis in C programs that allow extensive use of pointers. Cetus provides an advanced interprocedural points-to analysis framework, and this section describes the intraprocedural design of this framework. The goal of the points-to analyzer is to identify the set of named or unnamed memory locations that a pointer variable may point to.

The Cetus points-to analyzer is based on the design described in: "A practical interprocedural alias analysis for an optimizing/parallelizing C Compiler", M. Emami, Masters Thesis, McGill University, 1993.

A flow-sensitive approach is used for analyzing intraprocedural pointer information. Within a procedure, pointer relationships are created through explicit pointer assignment. Updates are also possible through function calls. At the end of the iterative pointer information update, once a fixed point is reached, the analyzer provides a map as output. This maps every statement in the procedure to the points-to information that exists before it, which will be made clear through examples shown below. Thus, accurate pointer information is made available at every point in the program. The output map is used by the interprocedural framework, as described in Section (?).

The following sections provide an overview of the intraprocedural implementation and describe the results through simple examples.

1) Points-to Representation: The analyzer uses the Cetus Domain representation to handle points-to information. Special data structures PointsToDomain and PointsToRel are used to implement points-to relationships. These data structures use Cetus Symbol information to uniquely identify memory locations. Source level symbol information is available through the Symbol interface for named locations. For unnamed locations such as heap allocated memory, string literal locations, and other special locations, the analyzer uses a special Symbol called AbstractLocation. The relationship also contains a boolean value identifying whether the pointer relationship is definitely(D) or only possibly(P) valid at that program point. Arrays are handled by the analyzer conservatively, an array is considered as a single memory location and every pointer pointing to or pointing into an array is said to possibly(P) point to the single location array. This helps in simplification of the analysis. Aggregate structures such as structs are handled precisely as a combination of scalars, arrays and other aggregate structures. We use a newly modified AccessSymbol to handle structs.

The PointsToDomain represents a set of pointer relationships at every program point. The analyzer uses NullDomain (already available in Cetus) as a starting point for the analyzer and continues to build relationships through forward dataflow analysis. The UniverseDomain is provided as a conservative fall-back domain representation in case the analyzer is unable to infer accurate information with regards to the pointer relationships. The Universe is meant to represent the situation where all pointers can be assumed to be pointing to all memory locations in the program universe.

int main(void)
{
        struct {
                int *f1;
        }x, *z;

        int y[70], w;

        /* [] */
        z = &x;
        /* [(z,x,D)] */
        (*z).f1 = &w;
        /* [(<x.f1>,w,D), (z,x,D)] */
        x.f1 = &y[0];
        /* [(<x.f1>,y,P), (z,x,D)] */
}

2) Control Flow-Based Analysis: Cetus implements a work-list based iterative data flow analyzer. The intraprocedural driver obtains a control flow graph (see CFGraph) for the procedure, this graph is statement based. Control flow constructs such as loops, if-else statements, switch statements etc., are handled through iterative traversal of the nodes of the control flow graph. Points-to information is merged using PointsToDomain functionality, as shown in the examples below. Here, relationships change from definite(D) to possible(P) following a merge from different paths in the program.

int main(void)
{
        int a, b, c, i;
        int *p1, *p2, **p3;

        /* [] */
        p1 = &a;
        /* [(p1,a,D)] */
        p3 = &p1;
        /* [(p1,a,D), (p3,p1,D)] */
        p2 = *p3;
        /* [(p1,a,D), (p2,a,D), (p3,p1,D)] */
        if (i > 0)
                /* [(p1,a,D), (p2,a,D), (p3,p1,D)] */
                p1 = &b;
        else
                /* [(p1,a,D), (p2,a,D), (p3,p1,D)] */
                p1 = &c;

        /* [(p1,b,P), (p1,c,P), (p2,a,D), (p3,p1,D)] */
        return;
}

3) Interprocedural analysis: The interprocedural analysis of points-to relation was built upon the common interprocedural framework introduced in the current version. The role of the interprocedural analysis is to reflect the effect of entering or returning from a called procedure, by appropriately renaming the variables that appear in the points-to relations. The renaming algorithm is quite similar to one that appears in the reference implementation, but we adjusted the interprocedural driver so that it works on top of our common IPA framework. Currently, the IPA points-to analysis does not support full context sensitivity and gives up analysis of programs containing function calls through function pointers. The following example shows the result of interprocedural points-to analysis on a simple program.

int *x, y;
int main()
{
        int *a;
        a=&y;
            /* [(a,y,D)] */
        x=&y;
            /* [(a,y,D), (x,y,D)] */
        f(a);
            /* [(a,y,D), (x,y,D)] */
        return 0;
}

void f(int *m)
{
            /* [(m,y,D), (x,y,D)] */
        g(m);
            /* [(m,y,D), (x,y,D)] */
        return;
}

void g(int *n)
{
            /* [(n,y,D), (x,y,D)] */
        return;
}

4) Alias Analysis Support: Pointer relationships obtained as a result, can be used to create alias sets for that program point by identifying different pointers, or aliases, pointing to the same memory locations. As described in detail in the section on Alias Analysis, advanced interprocedural pointer information is used to build alias sets for the requested program statement. Refer to the Cetus API documentation for methods to access alias information in user passes.