2020 Volume 28 Pages 436-444
The betweenness centrality is a widely used property to identify important nodes in social networks. Several algorithms have been studied to efficiently compute the top-k nodes with the highest betweenness centrality on a graph where all the data is available. However, all the graph data of real social networks are not typically available to third parties such as researchers or marketers, and hence, an estimation algorithm based on sampling the graph data is required. Accurately estimating the top-k nodes with the highest betweenness centrality from a small sample of a graph is a challenging task. First, the top-k nodes need to be included in the small sample. Second, nodes with the high betweenness centrality that is defined on the whole graph need to be accurately identified from the small sample. We propose a random walk-based algorithm to estimate the top-k nodes with the highest betweenness centrality by utilizing the ego betweenness centrality that has a high correlation with the betweenness centrality in social networks. The proposed algorithm firstly obtains a small sample that includes many of top-k nodes with the highest betweenness centrality via a random walk on a social network. Then, we obtain unbiased estimates of the ego betweenness centrality of sampled nodes and approximate the top-k nodes with the highest betweenness centrality as the top-k nodes with the highest estimated ego betweenness centrality. The proposed estimator efficiently estimates the ego betweenness centrality of each sample without additionally sampling the graph data by utilizing the neighbor data of the previous and the next samples. The experiments using real social network datasets show that the proposed algorithm estimates more accurately the top-k nodes with the highest betweenness centrality than existing algorithms when the sample size is small.