SCIS & ISIS
SCIS & ISIS 2008
Session ID : FR-B4-4
Conference information

A New Document Clustering Method Based on Comparative Advantage
*Jie JiQiangfu ZhaoRyouhei ShindoYousuke Kunishi
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract
Document clustering is the process to partition a set of unlabelled documents into some categories or clusters. To analyze the documents based on the clustering results, it is expected that all documents in each cluster have some shared concept. This shared concept is often represented as the centroid. K-means is a well-known algorithm for unsupervised clustering. It can cluster the document set to satisfy the minimum mean squared error (MSE) function. However, intuitively speaking, the centroid may not be able to represent a concept clearly because it is just the average of all documents in the same cluster. To represent a cluster more clearly, we expect that each cluster has a small set of representative key terms. Although many document clustering methods have been proposed in the literature, few of them deal with the key terms explicitly. In this study, we propose a new method for classifying the documents based on the concept of comparative advantage, and a new clustering algorithm for extracting important key terms. Experimental results show that the proposed method can generate better results in the sense that the overlap between the sets of representative terms of the clusters is smaller.
Content from these authors
© 2008 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top