Abstract
Most of the previous relevance feedback methods re-rank search results using only the information of surface words in texts. In this paper, we present a novel method that uses not only the information of surface words, but also that of latent words that are highly probable from the texts. In the proposed method, we infer the latent word distribution in each document in the search results using latent Dirichlet allocation (LDA). When user feedback is given, we also infer the latent word distribution in the feedback using LDA. We calculate the similarities between the user feedback and each document in the search results using both the surface and latent word distributions, and then, we re-rank the search results based on the similarities. Evaluation results show that when user feedback that consists of two documents (3,589 words) is given, our method improves the initial search results by 27.6% in precision at 10 (P@10). Additionally, it proves that our method has the advantage of performing well even when only a small amount of user feedback is available (e.g., improvement of 5.3% in P@10 was achieved even when user feedback constituted only 57 words).