単語の重要度を考慮したテキスト-単語マップの作成

和田 英樹; 本多 克宏; 市橋 秀友; 野津 亮

doi:10.14864/fss.23.0.519.0

23rd Fuzzy System Symposium

Session ID : TD3-2

DOI https://doi.org/10.14864/fss.23.0.519.0

Conference information

Host: Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)

Text-keyword map construction considering typicality of keywords

*Hideki Wada, Katsuhiro Honda, Hidetomo Ichihashi, Akira Notsu

Author information

Keywords: Text mining, Fuzzy clustering, Principal component analysis

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

One of useful approach to intuitive text mining is construction of text maps that characterize mutual relation among text documents composed of many keywords. In the text mining tasks, text documents are first preprocessed into numerical weights such as tf-idf weights by considering term frequency and inverse document frequency. Principal component analysis (PCA) is used for constructing low dimensional plots of multivariate data. It is, however, often difficult to extract meaningful features from a low dimensional text map when "interesting structure" is concealed by many nonsignificant keywords. This paper proposes to use a liner fuzzy clustering-based variable selection mechanism for selecting keywords that are useful for characterizing text documents. In the variable selection model, the absolute typicality of keywords is estimated based on a graded possibilistic approach. An experimental result with a famous Japanese novel "Kokoro" by Soseki Natsume demonstrates the characteristics of the new approach.

Corresponding author

Register with J-STAGE for free!