Computer Software
Print ISSN : 0289-6540
A Method of Refining Topic Models Based on Term and Document Frequencies.
Kazuyuki HIGASHIHitoshi TAKAHASHIHiroyuki NAKAGAWATatsuhiro TSUCHIYA
Author information
JOURNAL FREE ACCESS

2019 Volume 36 Issue 4 Pages 4_25-4_31

Details
Abstract

Software developers have made increasing use of natural language documents in many cases. Documents may contain useful information for software developers; however, it is difficult to extract such information when the number of the documents is considerably large. Latent Dirichlet Allocation (LDA) is a promising way of topic modeling. LDA-based topic modeling can be useful in facilitating comprehension of such documents. In LDA, a stop word list is used to filter general words for accurate topic classification. However, when using an existing stop word list, it is difficult to filter words that are not general but frequently appear in the target documents. In this paper, we propose a method that consists of two steps: stop word extraction from target documents and similar topic merging. We experimentally evaluate the method by applying it to mailing list. The experimental results demonstrate that our method constructs a topic model more accurately than the existing method.

Content from these authors
© 2019, Japan Society for Software Science and Technology
Previous article Next article
feedback
Top