2008 年 58 巻 6 号 p. 438-444
In this paper, automatic text categorization is considered to be a series of information processing and I propose a new classification technique called Maximum Frequency Ratio Accumulation Method (MFRAM). This is a simple technique that adds up the maximum ratios of term frequency among categories. However, in MFRAM, the use of feature terms is unlimited. Therefore, I propose the use of Character N-gram and Word N-gram as feature terms using the above-described particularity of MFRAM. Next, the proposed method is evaluated by performing several experiments. In these experiments, I classify newspaper articles from Japanese CD-Mainichi 2002 and English Reuters-21578 using the Naive Bayes method (baseline method) and the proposed method. As a result, I show that the classification accuracy of the proposed method is better than that of the baseline method. Specifically, the recall of the proposed method is 88.5% for Japanese CD-Mainichi 2002 and 83.1% for English Reuters-21578. Thus, the proposed method has very high performance. Although the proposed method is a simple technique, it provides a new perspective and has excellent potential and is language-independent. Thus, the proposed method is expected to be developed further in the future.