2012 Volume 132 Issue 8 Pages 1347-1355
The Internet news are texts which involve from various fields, therefore, when a text data that will show a rapid increase of the number of dimensions of feature vectors of Self-Organizing Map (SOM) is added, these results cannot be reflected to learning. Furthermore, it is difficult for users to recognize the learning results because SOM can not produce any label information by each cluster. In order to solve these problems, we propose SOM with additional learning and dimensional by category mapping which is based on the category structure of Wikipedia. In this method, input vector is generated from each text and the corresponding Wikipedia categories extracted from Wikipedia articles. Input vectors are formed in the common category taking the hierarchical structure of Wikipedia category into consideration. By using the proposed method, the problem of reconfiguration of vector elements caused by dynamic changes in the text can be solved. Moreover, information loss in newly obtained index term can be prevented.
The transactions of the Institute of Electrical Engineers of Japan.C
The Journal of the Institute of Electrical Engineers of Japan