The SIMTree release has the following layout:

  • apps
    • barneshut: Barnes-Hut benchmark
    • heatray: Photon Mapping benchmark
    • nearestneighbor: Nearest Neighbor benchmark
    • pointcorr: Point Correlation benchmark
    • vptree: Vantage Point benchmark
  • harness: Files to facilitate runtime and performance counter measurements. These files are not used with apps, and are provided just for reference.
  • input: Input files for benchmarks
  • licenses: Licenses of released code
  • src: Source code of SIMTree

SIMTree is built on top of the ROSE compiler infrastructure. Users must install Rose and configure its path in src/Makefile.

Each benchmark has input and output folders. The output folder Makefile has build targets to facilitate building SIMTree, and using SIMTree to transform the code in the input folder.

SIMTree works in two phases. The first phase analyzes the input code and generates header files with code for point blocking, traversal splicing, and the SoA layout. The second phase reads the generated header files, and transforms the source files. Two phases are necessary because Rose currently does not support dynamic insertion of new code, and we construct the header files with plain text (use of Rose builder functions would be too onerous).

Hence each benchmark can be transformed with two make invocations: 'make transform1' and 'make transform2'. For example on Barnes-Hut, 'make transform1' builds SIMTree (if necessary):

yjo:SIMTree yjo$ cd apps/barneshut/output/
yjo:output yjo$ make transform1
cd ../../../src; make
g++ -I../../../../rosehome/include -O3   -I. -c main.cpp
g++ -I../../../../rosehome/include -O3   -I. -c RecursiveStructureAnalysis.cpp
g++ -I../../../../rosehome/include -O3   -I. -c MethodInfo.cpp
g++ -I../../../../rosehome/include -O3   -I. -c Globals.cpp
g++ -I../../../../rosehome/include -O3   -I. -c BlockTransform.cpp
g++ -I../../../../rosehome/include -O3   -I. -c Utils.cpp
g++ -I../../../../rosehome/include -O3   -I. -c RecursiveCallSet.cpp
g++ -I../../../../rosehome/include -O3   -I. -c BlockClasses.cpp
g++ -I../../../../rosehome/include -O3   -I. -c IntermediaryStateClasses.cpp
g++ -I../../../../rosehome/include -O3   -I. -c CommonTransform.cpp
g++ -I../../../../rosehome/include -O3   -I. -c SpliceClasses.cpp
g++ -I../../../../rosehome/include -O3   -I. -c SpliceTransform.cpp
g++ -I../../../../rosehome/include -O3   -I. -c RecursiveField.cpp
g++ -I../../../../rosehome/include -O3   -I. -c SOATransform.cpp
g++ -o SIMTree main.o RecursiveStructureAnalysis.o MethodInfo.o Globals.o BlockTransform.o 
Utils.o RecursiveCallSet.o BlockClasses.o IntermediaryStateClasses.o CommonTransform.o 
SpliceClasses.o SpliceTransform.o RecursiveField.o SOATransform.o  
-lrose -L../../../../rosehome/lib -lboost_filesystem -lboost_system -L/usr/local/lib

and generates header files in the first phase:

../../../src/SIMTree ../input/barneshut.cpp
candidate recursive call: insert_point	insert_point(newNode,p,half_r,(depth + 1))
  candidate recursive class arg: child
enclosing for loop found!
  but not annotated parallelForOnTree
candidate recursive call: insert_point	insert_point(child,p,half_r,(depth + 1))
candidate recursive call: free_tree	free_tree(child)
candidate recursive call: compute_center_of_mass	compute_center_of_mass(child)
candidate recursive call: compute_force_recursive	compute_force_recursive(p,(node -> child0),dsq,epssq)
  candidate recursive class arg: node
enclosing for loop found!
SOATransform::fieldAccessAnalysis: "compute_force_recursive"
 SOATransform::fieldAccessAnalysis: "update_point"

-----SOATransform::summarizeForBlockClass-----
class: "Point"
 field: acc	readViaPoint writeViaPoint 
  flattening sub field: x
  flattening sub field: y
  flattening sub field: z
 field: cofm	readViaPoint readViaNode 
  flattening sub field: x
  flattening sub field: y
  flattening sub field: z
 field: num_nodes_traversed	readViaPoint writeViaPoint 
-----------------------------

possible splice optimization level: 2
using splice optimization level: 2
generated block classes, ready for second pass

Then, 'make transform2' transforms the source code in the second phase:

yjo:output yjo$ make transform2
cd ../../../src; make
make[1]: Nothing to be done for `all'.
../../../src/SIMTree rose_barneshut.cpp

.....

second pass done

The input code is in ../input/barneshut.cpp, and the transformed code is in ../output/rose_rose_barneshut.cpp. You can compare the code to see how the transformation was applied.

Both input and output folders have Makefile targets to build the benchmark executable and test scripts to run it. Simply 'make' builds the benchmark, and 'python test.py' runs it. The benchmark records the total number of nodes traversed to verify correctness. This count should be the same for both input and output

yjo:output yjo$ cd ../input/
yjo:input yjo$ make
g++ -g3 -I. -c barneshut.cpp
g++ -o barneshut barneshut.o    
yjo:input yjo$ python test.py 
Overriding nbodies from input file. nbodies = 10000
Configuration: nbodies = 10000, ntimesteps = 2, dtime = 0.025000 eps = 0.050000, tol = 0.500000
Time step 0:
sum_nodes_traversed:12716564
position: -0.003950 -0.011949 0.010587
Time step 1:
sum_nodes_traversed:25449837
position: -0.004174 -0.011977 0.010532
yjo:input yjo$ cd ../output/
yjo:output yjo$ make
g++ -g3 -I. -c rose_rose_barneshut.cpp
g++ -o barneshut rose_rose_barneshut.o   
yjo:output yjo$ python test.py 
Overriding nbodies from input file. nbodies = 10000
Configuration: nbodies = 10000, ntimesteps = 2, dtime = 0.025000 eps = 0.050000, tol = 0.500000
Time step 0:
sum_nodes_traversed:12716564
position: -0.003950 -0.011949 0.010587
Time step 1:
sum_nodes_traversed:25449837
position: -0.004174 -0.011977 0.010532

SIMTree does not perform the vectorization itself. SIMTree's goal is to generate vectorizable code. While ideally icc should auto-vectorize the transformed code, we found that the presence of any non-vectorizable code or nested loops precludes vectorization by icc. Applying a combination of loop distribution and interchange can split the target loop into a series of icc-vectorizable loops.

Running SIMTree without arguments shows the full set of command line options. SIMTree can transform code with/without traversal splicing and the SoA layout.

yjo:output yjo$ ../../../src/SIMTree 
SIMTree, a framework to automatically vectorize tree traversal algorithms.
Youngjoon Jo (yjo@purdue.edu)

Copyright 2013, School of Electrical and Computer Engineering, Purdue University.

./SIMTree [options] 
 --nosplice            : do not apply traversal splicing
 --nosoa               : do not apply structure of arrays layout
 --blocksize     : block size for point blocking [default: 512]
 --splicedepth  : splice depth for traversal splicing [default: 4]
 --spliceopt    : splicing optimization level [0-2, default:2]