Journal of Japan Society for Fuzzy Theory and Intelligent Informatics
Online ISSN : 1881-7203
Print ISSN : 1347-7986
ISSN-L : 1347-7986
Short Notes
An Idea of a Rough Set Theory Based Document Classification System
Masaki KUREMATSU
Author information
JOURNAL FREE ACCESS

2020 Volume 32 Issue 4 Pages 778-781

Details
Abstract

The document classification task is a well-known task for natural language processing. In this paper, I propose a Rough Set Theory based document classification system. First, the proposed system makes a decision table by combining the label of the document and terms extracted by the document frequency and reduction. Next, it extracts decision rules from upper approximation and lower approximation, respectively. Then it matches an unlabeled document to both decision rules and extracts a label which has the maximum value of the sum of rules’ weight. I use SI (Satisfaction Index), CI (Coverage Index) and Lift as the rules’ weight. In order to evaluate this approach, I implemented a prototype system and tried to classify labeled patent publications in Japanese with experts. This system could extract some rules evaluated as useful by an expert and shows its accuracy rate is higher than by selecting the modal label. However, the rate of the useful rules is only 25% and the accuracy rate and the Kappa statistics are not enough to use. This result cannot also say this approach is better than Naive Bayes Classifier. In the next study, I improve this approach based on the analysis of this evaluation.

Content from these authors
© 2020 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top