日本経営工学会論文誌
Online ISSN : 2187-9079
Print ISSN : 1342-2618
ISSN-L : 1342-2618
Original Paper (Theory and Methodology)
Regularized Distance Metric Learning for Document Classification and Its Application
Kenta MIKAWAMasayuki GOTO
著者情報
ジャーナル フリー

2015 年 66 巻 2E 号 p. 190-203

詳細
抄録

Due to the development of information technologies, there is a huge amount of text data posted on the Internet. In this study, we focus on distance metric learning, which is one of the models of machine learning. Distance metric learning is a method of estimating the metric matrix of Mahalanobis squared distance from training data under an appropriate constraint. Mochihashi et al. proposed a method which can derive the optimal metric matrix analytically. However, the vector space for document data is normally very high dimensionally and sparse. Therefore, when this method is applied to document data directly, over-fitting may occur because the number of estimated parameters is in proportion to the square of the input data dimensions. To avoid the problem of over-fitting, a regularization term is introduced in this study. The purpose of this study is to formulate the regularized estimation of the metric matrix in which the optimal metric matrix can be derived analytically. To verify the effectiveness of the proposed method, document classification using a Japanese newspaper article is conducted.

著者関連情報
© 2015 Japan Industrial Management Association
前の記事 次の記事
feedback
Top