人工知能学会第二種研究会資料
Online ISSN : 2436-5556
Mixed Factors Analysis: Unsupervized Statistical Discrimination with Kernel Feature Extraction
Ryo YoshidaTomoyuki HiguchiSeiya ImotoSatoru Miyano
著者情報
研究報告書・技術報告書 フリー

2007 年 2007 巻 DMSM-A702 号 p. 05-

詳細
抄録

We address the problem of clustering and feature extraction of exceedingly high-dimensional data, referred to as n ≪ p data, where the dimensionality of the feature space p is much higher than the number of training samples n. For such a sparsely-distributed dataset, direct application of conventional model-based clustering might be impractical due to occurrence of an over-learning. In order to overcome the limit of application, we developed the mixed factors model in Yoshida et al. (2004),which was originally aimed at solving the over-learning problem in the unsupervised discriminant analysis of gene expression profiles. The idea is to extract the feature variables involved in the underlying group structure, and then, train an unsupervized discriminative classifier by using the extracted features which are projected onto the lower-dimensional factor space. By alternating projection and clustering, the method seeks an optimal direction of projection such that the overlap of the projected clusters is small. One main purpose of this paper is to elucidate the statistical machineries of the feature extraction system offered by the mixed factors model. Particularly, we give the connection to Fisher's discriminant analysis and the principal component analysis. After showing some theoretical consequences, we also attempt to present a more generic approach of clustering within the framework of kernel machine learning. By this extension, we can deal with much more complicated shapes of clusters and clustering on the generic feature spaces.

著者関連情報
© 2007 著作者
前の記事 次の記事
feedback
Top