不完全データを含む混合データベースのファジィクラスタリングに関する考察とアルゴリズムの検討 II

古川 貴司; 大西 真一; 山ノ井 高洋

doi:10.14864/fss.30.0_754

Abstract

The focus of fuzzy c-means (FCM) clustering method is normally used on numerical data. However, most data existing in databases are both categorical and numerical. To date, clustering methods have been developed to analyze only complete data. Although we, sometimes, encounter data sets that contain one or more missing feature values (incomplete data) in data intensive classiﬁcation systems, traditional clustering methods cannot be used for such data. Thus, we study this theme and discuss clustering methods that can handle mixed numerical and categorical incomplete data. In this paper, we propose some algorithms that use the missing categorical data imputation method and distances between numerical data that contain missing values. Finally, we show through a real data experiment that our proposed method is more effective than without imputation, when missing ratio becomes higher.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!