2009 Volume 2009 Issue SWO-020 Pages 14-
Very recently, topic model-based retrieval methods have produced good results using Latent Dirichlet Allocation (LDA) model or its variants in language modeling framework. However, for the task of retrieving annotated documents, LDA-based methods cannot directly make use of multiple attribute types that are specified by the annotations. In this paper, we explore a new retrieval method using a multitype topic model that can directly handle multiple word types, such as annotated entities, category labels and other words that are typically used in Wikipedia. We investigate how to effectively apply the multitype topic model to retrieve documents from a typeannotated collection, and then show that our proposed method significantly outperforms several state-of-the-art methods through experiments in the task of entity ranking using a Wikipedia collection.