Mixed Factors Analysis: Unsupervized Statistical Discrimination with Kernel Feature Extraction

Ryo Yoshida; Tomoyuki Higuchi; Seiya Imoto; Satoru Miyano

doi:10.11517/jsaisigtwo.2007.DMSM-A702_05

抄録

We address the problem of clustering and feature extraction of exceedingly high-dimensional data, referred to as n ≪ p data, where the dimensionality of the feature space p is much higher than the number of training samples n. For such a sparsely-distributed dataset, direct application of conventional model-based clustering might be impractical due to occurrence of an over-learning. In order to overcome the limit of application, we developed the mixed factors model in Yoshida et al. (2004),which was originally aimed at solving the over-learning problem in the unsupervised discriminant analysis of gene expression profiles. The idea is to extract the feature variables involved in the underlying group structure, and then, train an unsupervized discriminative classifier by using the extracted features which are projected onto the lower-dimensional factor space. By alternating projection and clustering, the method seeks an optimal direction of projection such that the overlap of the projected clusters is small. One main purpose of this paper is to elucidate the statistical machineries of the feature extraction system offered by the mixed factors model. Particularly, we give the connection to Fisher's discriminant analysis and the principal component analysis. After showing some theoretical consequences, we also attempt to present a more generic approach of clustering within the framework of kernel machine learning. By this extension, we can deal with much more complicated shapes of clusters and clustering on the generic feature spaces.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

第二種研究会の全記事は認証なしでアクセス可能です．また，各記事の著作権は原則として著者に帰属します．

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）