2016 年 31 巻 2 号 p. B-F71_1-12
Estimating the relevance or proximity between vertices in a network is a fundamental building block of network analysis and is useful in a wide range of important applications such as network-aware searches and network structure prediction. In this paper, we (1) propose to use top-k shortest-path distance as a relevance measure, and (2) design an efficient indexing scheme for answering top-k distance queries. Although many indexing methods have been developed for standard (top-1) distance queries, no methods can be directly applied to top-k distance. Therefore, we develop a new framework for top-k distance queries based on 2-hop cover and then present an efficient indexing algorithm based on the recently proposed pruned landmark labeling scheme. The scalability, efficiency and robustness of our method are demonstrated in extensive experimental results. It can construct indices from large graphs comprising millions of vertices and tens of millions of edges within a reasonable running time. Having obtained the indices, we can compute the top-k distances within a few microseconds, six orders of magnitude faster than existing methods, which require a few seconds to compute these distances. Moreover, we demonstrate the usefulness of top-k distance as a relevance measure by applying them to link prediction, the most fundamental problem in graph data mining. We emphasize that the proposed indexing method enables the first use of top-k distance for such tasks.