文書上の単語対を素性とした潜在的トピック推定

北島 理沙; 小林 一郎

doi:10.3156/jsoft.25.501

Abstract

Latent Dirichlet Allocation (LDA) has been widely used for analyzing latent topics of documents. It assigns a probability distribution over topics to each individual word ofdocuments, and then assigns latent topics as particular words which tend to be appeared in a particular topic. In the method, documents are treated as bag-of-words. It does not deal with the relation between words to precisely express the contents of documents. In this study, to estimate latent topics more precisely, we propose a method to assign a probability distribution over topics to pairs of words. Through document retrieval tasks, we investigate how we should provide constraints on pairs of words which are useful for extracting latent topics, and show that LDA can be improved upon by assigning probability distributions to pairs of words.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!