The SIMTree release has the following layout:
- apps
- barneshut: Barnes-Hut benchmark
- heatray: Photon Mapping benchmark
- nearestneighbor: Nearest Neighbor benchmark
- pointcorr: Point Correlation benchmark
- vptree: Vantage Point benchmark
- harness: Files to facilitate runtime and performance counter measurements. These files are not used with apps, and are provided just for reference.
- input: Input files for benchmarks
- licenses: Licenses of released code
- src: Source code of SIMTree
SIMTree is built on top of the ROSE compiler infrastructure. Users must install Rose and configure its path in src/Makefile.
Each benchmark has input and output folders. The output folder Makefile has build targets to facilitate building SIMTree, and using SIMTree to transform the code in the input folder.
SIMTree works in two phases. The first phase analyzes the input code and generates header files with code for point blocking, traversal splicing, and the SoA layout. The second phase reads the generated header files, and transforms the source files. Two phases are necessary because Rose currently does not support dynamic insertion of new code, and we construct the header files with plain text (use of Rose builder functions would be too onerous).
Hence each benchmark can be transformed with two make invocations: 'make transform1' and 'make transform2'. For example on Barnes-Hut, 'make transform1' builds SIMTree (if necessary):
yjo:SIMTree yjo$ cd apps/barneshut/output/ yjo:output yjo$ make transform1 cd ../../../src; make g++ -I../../../../rosehome/include -O3 -I. -c main.cpp g++ -I../../../../rosehome/include -O3 -I. -c RecursiveStructureAnalysis.cpp g++ -I../../../../rosehome/include -O3 -I. -c MethodInfo.cpp g++ -I../../../../rosehome/include -O3 -I. -c Globals.cpp g++ -I../../../../rosehome/include -O3 -I. -c BlockTransform.cpp g++ -I../../../../rosehome/include -O3 -I. -c Utils.cpp g++ -I../../../../rosehome/include -O3 -I. -c RecursiveCallSet.cpp g++ -I../../../../rosehome/include -O3 -I. -c BlockClasses.cpp g++ -I../../../../rosehome/include -O3 -I. -c IntermediaryStateClasses.cpp g++ -I../../../../rosehome/include -O3 -I. -c CommonTransform.cpp g++ -I../../../../rosehome/include -O3 -I. -c SpliceClasses.cpp g++ -I../../../../rosehome/include -O3 -I. -c SpliceTransform.cpp g++ -I../../../../rosehome/include -O3 -I. -c RecursiveField.cpp g++ -I../../../../rosehome/include -O3 -I. -c SOATransform.cpp g++ -o SIMTree main.o RecursiveStructureAnalysis.o MethodInfo.o Globals.o BlockTransform.o Utils.o RecursiveCallSet.o BlockClasses.o IntermediaryStateClasses.o CommonTransform.o SpliceClasses.o SpliceTransform.o RecursiveField.o SOATransform.o -lrose -L../../../../rosehome/lib -lboost_filesystem -lboost_system -L/usr/local/lib
and generates header files in the first phase:
../../../src/SIMTree ../input/barneshut.cpp candidate recursive call: insert_point insert_point(newNode,p,half_r,(depth + 1)) candidate recursive class arg: child enclosing for loop found! but not annotated parallelForOnTree candidate recursive call: insert_point insert_point(child,p,half_r,(depth + 1)) candidate recursive call: free_tree free_tree(child) candidate recursive call: compute_center_of_mass compute_center_of_mass(child) candidate recursive call: compute_force_recursive compute_force_recursive(p,(node -> child0),dsq,epssq) candidate recursive class arg: node enclosing for loop found! SOATransform::fieldAccessAnalysis: "compute_force_recursive" SOATransform::fieldAccessAnalysis: "update_point" -----SOATransform::summarizeForBlockClass----- class: "Point" field: acc readViaPoint writeViaPoint flattening sub field: x flattening sub field: y flattening sub field: z field: cofm readViaPoint readViaNode flattening sub field: x flattening sub field: y flattening sub field: z field: num_nodes_traversed readViaPoint writeViaPoint ----------------------------- possible splice optimization level: 2 using splice optimization level: 2 generated block classes, ready for second pass
Then, 'make transform2' transforms the source code in the second phase:
yjo:output yjo$ make transform2 cd ../../../src; make make[1]: Nothing to be done for `all'. ../../../src/SIMTree rose_barneshut.cpp ..... second pass done
The input code is in ../input/barneshut.cpp, and the transformed code is in ../output/rose_rose_barneshut.cpp. You can compare the code to see how the transformation was applied.
Both input and output folders have Makefile targets to build the benchmark executable and test scripts to run it. Simply 'make' builds the benchmark, and 'python test.py' runs it. The benchmark records the total number of nodes traversed to verify correctness. This count should be the same for both input and output
yjo:output yjo$ cd ../input/ yjo:input yjo$ make g++ -g3 -I. -c barneshut.cpp g++ -o barneshut barneshut.o yjo:input yjo$ python test.py Overriding nbodies from input file. nbodies = 10000 Configuration: nbodies = 10000, ntimesteps = 2, dtime = 0.025000 eps = 0.050000, tol = 0.500000 Time step 0: sum_nodes_traversed:12716564 position: -0.003950 -0.011949 0.010587 Time step 1: sum_nodes_traversed:25449837 position: -0.004174 -0.011977 0.010532 yjo:input yjo$ cd ../output/ yjo:output yjo$ make g++ -g3 -I. -c rose_rose_barneshut.cpp g++ -o barneshut rose_rose_barneshut.o yjo:output yjo$ python test.py Overriding nbodies from input file. nbodies = 10000 Configuration: nbodies = 10000, ntimesteps = 2, dtime = 0.025000 eps = 0.050000, tol = 0.500000 Time step 0: sum_nodes_traversed:12716564 position: -0.003950 -0.011949 0.010587 Time step 1: sum_nodes_traversed:25449837 position: -0.004174 -0.011977 0.010532
SIMTree does not perform the vectorization itself. SIMTree's goal is to generate vectorizable code. While ideally icc should auto-vectorize the transformed code, we found that the presence of any non-vectorizable code or nested loops precludes vectorization by icc. Applying a combination of loop distribution and interchange can split the target loop into a series of icc-vectorizable loops.
Running SIMTree without arguments shows the full set of command line options. SIMTree can transform code with/without traversal splicing and the SoA layout.
yjo:output yjo$ ../../../src/SIMTree SIMTree, a framework to automatically vectorize tree traversal algorithms. Youngjoon Jo (yjo@purdue.edu) Copyright 2013, School of Electrical and Computer Engineering, Purdue University. ./SIMTree [options]--nosplice : do not apply traversal splicing --nosoa : do not apply structure of arrays layout --blocksize : block size for point blocking [default: 512] --splicedepth : splice depth for traversal splicing [default: 4] --spliceopt : splicing optimization level [0-2, default:2]