Title: Sampling Biases in IP Topology Measurements
Authors: Anukool Lakhina, John W. Byers, Mark Crovella, and Peng Xie
Date: July 15, 2002
Abstract:
Considerable attention has been focused on the properties of
graphs derived from Internet measurements. Router-level topologies
collected via traceroute studies have led some authors to conclude that
the router graph of the Internet is a scale-free graph, or more
generally a power-law random graph. In such a graph, the degree
distribution of nodes follows a distribution with a power-law tail.
In this paper we argue that the evidence to date for this conclusion is at
best insufficient. We show that graphs appearing to have power-law degree
distributions can arise surprisingly easily, when sampling graphs whose
true degree distribution is not at all like a power-law. For example,
given a classical Erdos-Renyi sparse, random graph, the subgraph
formed by a collection of shortest paths
from a small set of random sources to
a larger set of random destinations can easily appear to show a
degree distribution remarkably like a power-law.
We explore the reasons for how this effect arises, and show that in such
a setting, edges are sampled in a highly biased manner. This insight
allows us to distinguish measurements taken from the Erdos-Renyi
graphs from those taken from power-law random graphs. When we apply
this distinction to a number of well-known datasets, we find that the
evidence for sampling bias in these datasets is strong.