人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
Stochastic Gradient MCMCによるスケーラブルなトピックモデルの推定
横井 創磨佐藤 一誠中川 裕志
著者情報
ジャーナル フリー

2016 年 31 巻 6 号 p. AI30-C_1-9

詳細
抄録

Topic models are generative models of documents, automatically clustering frequently co-occurring words (topics) from corpora. Topics can be used as stable features that represent the substances of documents, so that topic models have been extensively studied as technology for extracting latent information behind large data. Unfortunately, the typical time complexity of topic model computation is the product of the data size and the number of topics, therefore the traditional Markov chain Monte Carlo (MCMC) method cannot estimate many topics on large corpora within a realistic time. The data size is a common concern in Bayesian learning and there are general approaches to avoid it, such as variational Bayes and stochastic gradient MCMC. On the other hand, the number of topics is a specific problem to topic models and most solutions are proposed to the traditional Gibbs sampler. However, it is natural to solve these problems at once, because as the data size grows, so does the number of topics in corpora. Accordingly, we propose new methods coping with both data and topic scalability, by using fast computing techniques of the Gibbs sampler on stochastic gradient MCMC. Our experiments demonstrate that the proposed method outperforms the state-of-the-art of traditional MCMC in mini-batch setting, showing a better mixing rate and faster updating.

著者関連情報
© 人工知能学会 2016
前の記事 次の記事
feedback
Top