日本経営工学会論文誌
Online ISSN : 2187-9079
Print ISSN : 1342-2618
ISSN-L : 1342-2618
Text Categorization Using the Maximum Ratio of Term Frequency(<Special English Issue>Optimization and Its Applications)
Makoto Suzuki
著者情報
ジャーナル フリー

2008 年 58 巻 6 号 p. 438-444

詳細
抄録

In this paper, automatic text categorization is considered to be a series of information processing and I propose a new classification technique called Maximum Frequency Ratio Accumulation Method (MFRAM). This is a simple technique that adds up the maximum ratios of term frequency among categories. However, in MFRAM, the use of feature terms is unlimited. Therefore, I propose the use of Character N-gram and Word N-gram as feature terms using the above-described particularity of MFRAM. Next, the proposed method is evaluated by performing several experiments. In these experiments, I classify newspaper articles from Japanese CD-Mainichi 2002 and English Reuters-21578 using the Naive Bayes method (baseline method) and the proposed method. As a result, I show that the classification accuracy of the proposed method is better than that of the baseline method. Specifically, the recall of the proposed method is 88.5% for Japanese CD-Mainichi 2002 and 83.1% for English Reuters-21578. Thus, the proposed method has very high performance. Although the proposed method is a simple technique, it provides a new perspective and has excellent potential and is language-independent. Thus, the proposed method is expected to be developed further in the future.

著者関連情報
© 2008 公益社団法人 日本経営工学会
前の記事 次の記事
feedback
Top