Abstract
We propose here a novel approach to explore an optimized number of topics in a document set using consensus clustering based on Non-negative Matrix Factorization (NMF). It is useful to automatically decide the number of topics from a document set since various approaches to extract topics heuristically decide it. Consensus clustering merges multiple results of clustering so that it achieves a robust clustering. In this paper, assuming that a robust clustering is achieved by the optimized number of clusters, we have proposed a novel consensus soft clustering algorithm based on NMF and estimated an optimized number of topics with exploring a robust classification of documents into the topics.