人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
論文
情報理論的枠組に基づくマイノリティ集合の検出
安藤 晋佐久間 淳鈴木 英之進小林 重信
著者情報
ジャーナル フリー

2007 年 22 巻 3 号 p. 311-321

詳細
抄録

Unsupervised learning techniques, e.g. clustering, is useful for obtaining a summary of a dataset. However, its application to large databases can be computationally expensive. Alternatively, useful information can also be retrieved from its subsets in a more efficient yet effective manner. This paper addresses the problem of finding a small subset of minority instances whose distribution significantly differs from that of the majority. Generally, such a subset can substantially overlap with the majority, which is problematic for conventional estimation of distribution. This paper proposes a new approach for estimating a minority distribution based on Information Theoretic Framework, an extension of the Rate Distortion Theory for unsupervised learning tasks. Specifically, the proposed method (a) estimates parameters which maximize the divergence between the minority and majority distributions, (b) penalizes the redundancy of data expression based on the mutual information between the observed and hidden variables, and (c) employs a hard assignment approximation to avoid computation of trivial conditional probabilities. The algorithm of the proposed method has no problem-dependent parameter and its time and space complexities are linear to the size of the minority subset. Experiments using artificial datasets show the proposed method yields significantly high precision and sensitivity in detecting minority subsets which substantially overlaps with the majority. The proposed method also substantially outperforms one-class classification and mixture estimation methods in real-world benchmark datasets for text and satellite imagery classification.

著者関連情報
© 2007 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top