Mars Mosaic Generation Timing Summary on 450MHz and 800MHz Clusters
Timing on one CPU:
The original algorithm took about 90 minutes on a single 450MHz Pentium III CPU to compose 134 images into a single mosaic. Algorithm changes result in a reduction of the required CPU time to about 48 minutes. Running the same algorithm and problem on a 800MHz CPU results in a time reduction to about 28 minutes.
Timing on multiple CPUs:
Parallelization of the mosaicing algorithm is shown on the 800MHz cluster in a red line and on the 450MHz cluster in a blue line. The dot-dashed line in green shows the ideal speed-up. The actual timings follow the linear scaling quite nicely. Deviations from the ideality can be attributed to load balancing problems and data staging problems. The red line extends to a larger number of CPUs since the 800 MHz cluster has twice as many CPUs available. The parallel algorithm deteriorates strongly starting at 24 CPUs. This can be attributed to data staging problems to all the CPUs. If the images are copied to the local disks on each node of the cluster the overall performance is significantly improved (purple crosses). The total processing time is reduced from 2.5 to 1.7 minutes. This comes at the expense of about 7minutes to copy the data to the local disks via rcp. This implies that if the algorithm is to be run on that many CPUs a different method and / or hardware must be found.
Acknowledgements:
This work was sponsored by the TMOD technology program under the Beowulf Application and Networking Environment (BANE) task.The original VICAR based software is maintained in the Multi-mission Image Processing Laboratory (MIPL). The work was performed in a collaboration between Gerhard Klimeck, Myche McAuley, Tom Cwik, Bob Deen, and Eric DeJong