Variations of Fuzzy Clustering for Cooccurrence Matrix and Their Application to Text Analysis

Chi-Hyon Oh; Katsuhiro Honda; Hidetomo Ichihashi

doi:10.14864/softscis.2008.0.947.0

Abstract

In this study, we compare several variations of Fuzzy Clustering for Cooccurrence Matrix (FCCM) in applications to text analysis. The FCCM was proposed to partition individuals and items of the cooccurrence matrix by maximizing the degree of aggregation of each cluster. The total amount of products of cooccurrence variables and memberships for individuals and items is regarded as the degree of aggregation. Several variations of FCCM which employ two types of constraints for memberships i.e. probabilistic and possibilistic and two types of regularizations to obtain fuzzy clusters, entropy maximization and K-L information, exist. In the experiments, we apply our methods to a data set which represents frequency of keywords appearing in text documents and compare the results of each clustering method. They are used to find mutual relation (or co-occurrence structure) among text documents and keywords in the applications. Those tasks are known as text mining.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!