Journal of Japan Industrial Management Association
Online ISSN : 2187-9079
Print ISSN : 1342-2618
ISSN-L : 1342-2618
Text Classification using the Difference of Term Frequency between Categories
Makoto SUZUKI
Author information
JOURNAL FREE ACCESS

2008 Volume 59 Issue 4 Pages 355-363

Details
Abstract
Automatic text classification was considered as a series of information processing, and a new classification technique, namely the "Accumulation Method," was proposed in my previous paper. This proposed method has the property of unlimited use of feature terms even though it is a simple technique. The use of "character N-gram" and "word N-gram" as feature terms are proposed using this property of the classification technique. Next, the technique is evaluated through some experiments. In these experiments, the newspaper articles of Japanese "CD-Mainichi 2002" using the Naive Bayes method (baseline method) and this proposed method were classified. As this result, the classification accuracy of the proposed method is shown to improve greatly compared to the baseline method. That is, it is 88.7%. Thus, the proposed method has very high performance. The proposed method has a new viewpoint, so it can be expected to be developed further in the future.
Content from these authors
© 2008 Japan Industrial Management Association
Previous article Next article
feedback
Top