Internet Distance Prediction:
Distance prediction algorithms use O(N) Round Trip Time (RTT) measurements to predict the N^2 RTTs among N nodes. Distance prediction can be applied to improve the performance of a wide variety of Internet applications: for instance, to guide the selection of a download server from multiple replicas, or to guide the construction of overlay networks or multicast trees. Although the accuracy of existing prediction algorithms has been extensively compared using the relative prediction error metric, their impact on applications has not been systematically studied.
In this project, we first study distance prediction algorithms from an application's perspective to answer the following questions: (1) Are existing prediction algorithms adequate for the applications? (2) Is there a significant performance difference between the different prediction algorithms, and which is the best from the application perspective? (3) How does the prediction error propagate to affect the user perceived application performance? (4) How can we address the fundamental limitation (i.e., inaccuracy) of distance prediction algorithms?
We systematically experiment with three types of representative applications (overlay multicast, server selection, and overlay construction), three distance prediction algorithms (GNP, IDES, and the triangulated heuristic), and three real-world distance datasets (King, PlanetLab, and AMP). We find that, although using prediction can improve the performance of these applications, the achieved performance can be dramatically worse than the optimal case where the real distances are known. We formulate statistical models to explain this performance gap. In addition, we explore various techniques to improve the prediction accuracy and the performance of prediction-based applications. We find that selectively conducting a small number of measurements based on prediction-based screening is most effective.
We further investigate how to improve the accuracy of the the distance prediction
algorithms. Our experience with different landmark selection schemes
shows that selecting nearby landmarks can increase the prediction
accuracy for short links, but other distance ranges are likely to
suffer. Uneven prediction qualities can cause significant impact on
the application's performance. Instead of trying to select the
landmark nodes in some ``intelligent'' fashion, we propose a
hierarchical prediction approach with straightforward landmark
selection. The hierarchical prediction utilizes multiple coordinate
sets at multiple distance scales, with the ``right'' scale being
chosen for prediction each time. We study two types of hierarchical
prediction schemes: the first scheme leverages a shared landmark
hierarchy, while in the second scheme, each node is allowed to choose
its own landmarks. Experiments with Internet measurement datasets
show that the hierarchical approach is very promising for improving
the accuracy of network distance prediction.