Text Categorization Using the Maximum Ratio of Term Frequency(<Special English Issue>Optimization and Its Applications)

Makoto Suzuki

doi:10.11221/jima.58.438

抄録

In this paper, automatic text categorization is considered to be a series of information processing and I propose a new classification technique called Maximum Frequency Ratio Accumulation Method (MFRAM). This is a simple technique that adds up the maximum ratios of term frequency among categories. However, in MFRAM, the use of feature terms is unlimited. Therefore, I propose the use of Character N-gram and Word N-gram as feature terms using the above-described particularity of MFRAM. Next, the proposed method is evaluated by performing several experiments. In these experiments, I classify newspaper articles from Japanese CD-Mainichi 2002 and English Reuters-21578 using the Naive Bayes method (baseline method) and the proposed method. As a result, I show that the classification accuracy of the proposed method is better than that of the baseline method. Specifically, the recall of the proposed method is 88.5% for Japanese CD-Mainichi 2002 and 83.1% for English Reuters-21578. Thus, the proposed method has very high performance. Although the proposed method is a simple technique, it provides a new perspective and has excellent potential and is language-independent. Thus, the proposed method is expected to be developed further in the future.

著者関連情報

お気に入り & アラート

閲覧履歴

前身誌

日本経営工学会誌

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）