TreeSplicer

TreeSplicer uses Apache Ant and perl scripts to facilitate building and testing.

Type 'ant' at the TreeSplicer folder to see use cases of the ant build file.

Type 'ant help-transform' at the TreeSplicer folder to see command line arguments for the transformation framework.

'ant transform-app -Dapp=pointcorrelation' will transform the pointcorrelation app with three configurations:

point blocking, output to apps/block/pointcorrelation
traversal splicing, output to apps/splice/pointcorrelation
point blocking and traversal splicing, output to apps/blocksplice/pointcorrelation

The original source code is in apps/base/pointcorrelation. You can compare the original and transformed code, to see how the transformation was applied.

'ant run-app -Dapp=barneshut -Dconf=medium -Dsort=false' transforms the barneshut app as above, and runs it with unsorted points for base/block/splice/blocksplice. During this transformation '--utilstats' is used, and the transformed code has hooks to 'util.Launcher' which records the autotuned block size and splice depth. By default each configuration is run 8 times, and the latter 7 runs are recorded to account for JIT compilation. A summary of the results can be seen in stats.csv, also in the TreeSplicer folder.

[java] configuration: 300000 bodies, 1 time steps
[java]
[java] Timestep 0 Center of Mass = 0.499901320001358 0.49983415285347105 0.5005274657672466
[java] totalNodesTraversed: 253718038
[java] run 6 of 8, runtime(ms): 12244
[java]
[java] autotune block size: 256
[java] avg convergence: 0.1570
[java] using splicing at depth: 3 reach: 6.00 nodes: 32768
[java] Timestep 0 Center of Mass = 0.499901320001358 0.49983415285347105 0.5005274657672466
[java] totalNodesTraversed: 253718038
[java] run 7 of 8, runtime(ms): 12483
[java]
[java] runtime(ms): avg:12058.57 min:11467 max:12554 stdev:454.96 stability:3.8
[java]
[java]
[echo] Results are summarized in stats.csv
BUILD SUCCESSFUL
Total time: 12 minutes 22 seconds
yjo@sherpao2:~/TreeSplicer-release$ more stats.csv
desc, benchmark, args, avg, min, max, stdev, stability, blockSize, spliceDepth, convergence
base, barneshut.Main, false true 300000 , 31404.29, 29458, 34312, 1561.28, 5.0, 0.00, 0.00, 0.0000
block, barneshut.Main, false true 300000 , 25339.57, 24887, 25681, 246.36, 1.0, 256.00, 0.00, 0.0000
splice, barneshut.Main, false true 300000 , 18870.29, 17835, 19547, 623.83, 3.3, 0.00, 2.00, 0.1563
splice, barneshut.Main, false true 300000 , 12058.57, 11467, 12554, 454.96, 3.8, 256.00, 3.00, 0.1487

It can be seen that the untransformed baseline takes 31 seconds, applying blocking speeds up to 25 seconds, applying traversal splicing speeds up to 18 seconds, and combining both has the best performance at 12 seconds.

'perl scripts/batch_test_all.pl' executes all apps with the medium and large inputs. Note that this will take a significant amount of time.

We also include other scripts:

batch_test_nn.pl: run many versions of a single app (in this case nearestneighbor) with customized app arguments.
test_pc.pl: run customized versions of a single app with customized app arguments.
profile_block_nn.pl: run app with a profiler to profile effective block size, traversal size and convergence of points.
heap_use_pc.pl: profile JVM heap usage of an app.
papi_bh.pl: run app with PAPI to get performance counters (e.g., CPI, L2 miss rate).
opt_levels_bh.pl: run app with different optimization levels of traversal splicing.
correct_test_all.pl: run all apps with correctness checks to verify the expected number of nodes traversed.

We include real input data for barneshut and nearestneighbor, which must be downloaded separately. They can be used by adding '-real' to the small/medium/large confs. ex) 'ant run-app -Dapp=nearestneighbor -Dconf=large-real' While the real confs are preset only for covtype.7d.gz for nearestneighbor, mnist.7d.gz can also be used, and both can be used for pointcorrelation, knearestneighborsplitplane and knearestneighborballtreeopt.

The barneshut data is taken directly from the Lonestar benchmarks. The nearestneighbor data was derived from the Covertype dataset of the UCI Machine Learning Repository (covtype.7d.gz) and the the Mnist dataset of the LIBSVM Data (mnist.7d.gz), and reduced to 7 dimensions via random projections.

We include two variants of the nearestneighbor benchmark to demonstrate the TreeSplicer optimization levels:

nearestneighborquad uses a quad tree with two split planes to show that order inferred from node is not limited to a binary tree.
nearestneighborgeneral uses a quad tree with two split planes, and a corrupt order (which performs more work to find the nearest neighbor) to show that TreeSplicer can correctly transform code where order cannot be inferred from the node.

TreeSplicer

Usage