2016 Volume 24 Issue 1 Pages 152-163
The problem of similarity search is a crucial task in many real-world applications such as multimedia databases, data mining, and bioinformatics. In this work, we investigate the similarity search on uncertain data modeled in Gaussian distributions. By employing Kullback-Leibler divergence (KL-divergence) to measure the dissimilarity between two Gaussian distributions, our goal is to search a database for the top-k Gaussian distributions similar to a given query Gaussian distribution. Especially, we consider non-correlated Gaussian distributions, where there are no correlations between dimensions and their covariance matrices are diagonal. To support query processing, we propose two types of novel approaches utilizing the notions of rank aggregation and skyline queries. The efficiency and effectiveness of our approaches are demonstrated through a comprehensive experimental performance study.