Internet Distance Prediction:
Inaccuracy, Impact on Internet Applications, and New Approaches



Distance prediction algorithms use O(N) Round Trip Time (RTT) measurements to predict the N^2 RTTs among N nodes. Distance prediction can be applied to improve the performance of a wide variety of Internet applications: for instance, to guide the selection of a download server from multiple replicas, or to guide the construction of overlay networks or multicast trees. Although the accuracy of existing prediction algorithms has been extensively compared using the relative prediction error metric, their impact on applications has not been systematically studied.

In this project, we first study distance prediction algorithms from an application's perspective to answer the following questions: (1) Are existing prediction algorithms adequate for the applications? (2) Is there a significant performance difference between the different prediction algorithms, and which is the best from the application perspective? (3) How does the prediction error propagate to affect the user perceived application performance? (4) How can we address the fundamental limitation (i.e., inaccuracy) of distance prediction algorithms?

We systematically experiment with three types of representative applications (overlay multicast, server selection, and overlay construction), three distance prediction algorithms (GNP, IDES, and the triangulated heuristic), and three real-world distance datasets (King, PlanetLab, and AMP). We find that, although using prediction can improve the performance of these applications, the achieved performance can be dramatically worse than the optimal case where the real distances are known. We formulate statistical models to explain this performance gap. In addition, we explore various techniques to improve the prediction accuracy and the performance of prediction-based applications. We find that selectively conducting a small number of measurements based on prediction-based screening is most effective.

We further investigate how to improve the accuracy of the the distance prediction algorithms. Our experience with different landmark selection schemes shows that selecting nearby landmarks can increase the prediction accuracy for short links, but other distance ranges are likely to suffer. Uneven prediction qualities can cause significant impact on the application's performance. Instead of trying to select the landmark nodes in some ``intelligent'' fashion, we propose a hierarchical prediction approach with straightforward landmark selection. The hierarchical prediction utilizes multiple coordinate sets at multiple distance scales, with the ``right'' scale being chosen for prediction each time. We study two types of hierarchical prediction schemes: the first scheme leverages a shared landmark hierarchy, while in the second scheme, each node is allowed to choose its own landmarks. Experiments with Internet measurement datasets show that the hierarchical approach is very promising for improving the accuracy of network distance prediction.