2013 Volume 25 Issue 1 Pages 501-510
Latent Dirichlet Allocation (LDA) has been widely used for analyzing latent topics of documents. It assigns a probability distribution over topics to each individual word ofdocuments, and then assigns latent topics as particular words which tend to be appeared in a particular topic. In the method, documents are treated as bag-of-words. It does not deal with the relation between words to precisely express the contents of documents. In this study, to estimate latent topics more precisely, we propose a method to assign a probability distribution over topics to pairs of words. Through document retrieval tasks, we investigate how we should provide constraints on pairs of words which are useful for extracting latent topics, and show that LDA can be improved upon by assigning probability distributions to pairs of words.