TreeSplicer

We know the importance of locality, and are familiar with program transformations to enhance locality such as loop interchange and loop tiling. These transformations work well for regular programs which operate over arrays and matrices. But what about irregular programs which operate over pointer based structures such as trees and graphs?

We focus on a subset of irregular programs where there are repeated tree traversals, which arises in many domains. Some examples are galactic simulations, nearest neighbor search and ray tracing. If the tree is large and is traversed many times, there is opportunity to exploit temporal locality. These programs are often written as recursive methods, where points are the entities which traverse the tree:

for (Point p : points) {
  recurse(p, root);
}

void recurse(Point p, Node n) {
  // do something
  if (cond) return;
  else {
    recurse(p, n.leftChild);
    recurse(p, n.rightChild);
  }
}

We observe that these repeated tree traversals are analogous to a doubly nested loop, where the traversal of each point is abstracted in p.traversal, and visit abstracts whatever computation is performed by a point at a node:

for (Point p : points) {
  for (Node n : p.traversal()) {
    visit(p, n);
  }
}

For large datasets, neither the point loop nor the node loop will fit in cache, which results in poor cache reuse and many cache misses. But, we know from regular programs that loop tiling can exploit locality in such cases! Point blocking and traversal splicing are analogues of loop tiling for repeated tree traversals:

Point blocking
Tiles the point loop. Takes a block of points instead of a single point, through the points' traversals, exploiting temporal locality among points even for large traversals which do not fit in cache. Is a simpler transformation than traversal splicing, and performs well when points are pre-sorted with semantic information.

Traversal splicing
Tiles the node loop. Chops up each point's traversal into many sub-traversals so that each sub-traversal is more likely to fit in cache. Dynamically reorders the points based on previous traversal history, so that consecutive points are likely to have similar traversals. Is a more complex transformation than point blocking with space overhead, but is less dependent on semantic aware pre-sorting of points.

Please refer to our papers for further details.

TreeSplicer

Introduction