Host: The Japanese Society for Artificial intelligence
Name : 87th SIG-KBS
Number : 87
Location : [in Japanese]
Date : January 29, 2010
Pages 05-
We propose a method for extracting bursty latent topics from a document stream that is a time-series data of documents. We utilize Latent Dirichlet Allocation (LDA), which is a probabilistic generative model of documents, for extracting latent topics, and introduce a time-filter for identifying bursty topics. We construct a measure of similarity between two documents with time-stamps on the basis of LDA and the time-filter, and extract bursty latent topics from a document stream by applying a hierarchical agglomerative clustering method. Using real data of document streams, we experimentally demonstrate the effectiveness of the proposed method.