Abstract
In this study, we proposed negation naive Bayes (NNB), a new method for text classification. Similar to complement naive Bayes (CNB), NNB uses the complement class. However, unlike CNB, NNB properly considers the prior in a mathematical way because NNB is derivable from the same equation (the maximum a posteriori equation) from which naive Bayes (NB) is derived. We carried out classification experiments on products offered on an internet auction site and on the 20 Newsgroups data set. For the latter, we carried out experiments in the following two settings and discussed the properties of NNB: (1) settings in which the number of words in each document decreases and (2) settings in which the distribution of documents over classes is skewed. We compared NNB with NB, CNB, and support vector machine (SVM). Our experiments showed that NNB outperforms other Bayesian approaches when the number of words in each document decreases and when texts are distributed non-uniformly over classes. Our experiments also showed that NNB sometimes provides the best accuracy and significantly outperforms SVM.