文書クラスタリングによるトピック抽出および課題発見

橋本 泰一; 村上 浩司; 乾 孝司; 内海 和夫; 石川 正道

doi:10.3392/sociotechnica.5.216

Abstract

The method that enabled to extract important topics from document clusters containing text documents of many subjects retrieved from Nikkei newspaper was developed. The hierarchical clustering algorithm, UPGMA was used to generate the tree structure of clusters according to the similarity of document vectors defined by noun words appeared in the documents. The document clustering revealed the intimate relationship with the process of the societal problem detection, classifying similar documents in each topical group and structuring the groups according to their contents. The method was evaluated by applying to the subject of the organizational hazards caused by Japanese industries during 1990-2005.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!