Title: Indexing Distances In Large Graphs And Applications In Search Tasks (MA Thesis)
Author: Michalis Potamias
Abstract:
Date: December 1, 2008
This thesis elaborates on the problem of preprocessing a large graph
so that single-pair shortest-path queries can be answered quickly at
runtime. Computing shortest paths is a well studied problem, but
exact algorithms do not scale well to real-world huge graphs in
applications that require very short response time. The focus is on
approximate methods for distance estimation, in particular in
landmarks-based distance indexing. This approach involves choosing
some nodes as landmarks and computing (offline), for each node in the
graph its embedding, i.e., the vector of its distances from all the
landmarks. At runtime, when the distance between a pair of nodes is
queried, it can be quickly estimated by combining the embeddings of
the two nodes. Choosing optimal landmarks is shown to be hard and thus
heuristic solutions are employed. Given a budget of memory for the
index, which translates directly into a budget of landmarks, different
landmark selection strategies can yield dramatically different results
in terms of accuracy. A number of simple methods that scale well to
large graphs are therefore developed and experimentally compared. The
simplest methods choose central nodes of the graph, while the more
elaborate ones select central nodes that are also far away from one
another. The efficiency of the techniques presented in this thesis is
tested experimentally using five different real world graphs with
millions of edges; for a given accuracy, they require as much as 250
times less space than the current approach which considers selecting
landmarks at random. Finally, they are applied in two important
problems arising naturally in large-scale graphs, namely social search
and community detection.