SCIS & ISIS
SCIS & ISIS 2008
Session ID : SU-F1-1
Conference information

A Multi-Class SVM Classification System Based on Learning Methods from Indistinguishable Documents
*JuiHsi FuSyYen KuoSingLing Lee
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract
In this paper, Support Vector Machines has been used to deal with multi-class Chinese official document classification. Several data retrieving techniques including sentence segmentation, term weighting, and feature extraction are adopted to implement our system. It is observed that most of misclassified documents are difficult to be labeled due to their indistinguishable document contents. Therefore, indistinguishable documents should be identified by systems in advance. In order to enhance classification accuracy and distinguishability, we first propose a general approach to identify possibly misclassified documents. Then, four OAA SVM classification methods are presented based on different learning strategies from those indistinguishable or misclassified documents. They are able to identify miclassified (indistinguishable) documents in advance and achieve accurate classification. Our experiments show that applying both indistinguishable documents and misclassified ones to the training set increases classification accuracy, and that is the most suitable for Chinese official document classification.
Content from these authors
© 2008 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top