Journal of Japan Industrial Management Association
Online ISSN : 2187-9079
Print ISSN : 1342-2618
ISSN-L : 1342-2618
Text Categorization Using the Maximum Ratio of Term Frequency(<Special English Issue>Optimization and Its Applications)
Makoto Suzuki
Author information
JOURNAL FREE ACCESS

2008 Volume 58 Issue 6 Pages 438-444

Details
Abstract
In this paper, automatic text categorization is considered to be a series of information processing and I propose a new classification technique called Maximum Frequency Ratio Accumulation Method (MFRAM). This is a simple technique that adds up the maximum ratios of term frequency among categories. However, in MFRAM, the use of feature terms is unlimited. Therefore, I propose the use of Character N-gram and Word N-gram as feature terms using the above-described particularity of MFRAM. Next, the proposed method is evaluated by performing several experiments. In these experiments, I classify newspaper articles from Japanese CD-Mainichi 2002 and English Reuters-21578 using the Naive Bayes method (baseline method) and the proposed method. As a result, I show that the classification accuracy of the proposed method is better than that of the baseline method. Specifically, the recall of the proposed method is 88.5% for Japanese CD-Mainichi 2002 and 83.1% for English Reuters-21578. Thus, the proposed method has very high performance. Although the proposed method is a simple technique, it provides a new perspective and has excellent potential and is language-independent. Thus, the proposed method is expected to be developed further in the future.
Content from these authors
© 2008 Japan Industrial Management Association
Previous article Next article
feedback
Top