Spam Filtering with Active Feature Identification

Masayuki Okabe; Seiji Yamada

doi:10.14864/softscis.2008.0.1218.0

Abstract

This paper proposes a spam filtering method that utilizes active learning and feature identification. Identification of effective features are very important procedure in spam filtering because spam mail includes so much meaningless words that are slightly different from each other. Those words bring down much calculation cost and performance reduction in filtering process. Thus identifying effective and ineffective features is promising approach in spam filtering. However traditional feature selection methods calculate the score of features based on some amount of labeled training data. This assumption does not hold in the situation of spam filtering. Spam filtering process starts with non or few labeled data, and gradually increases labeled data using user feedback. We propose a method to identify effective features through this active learning process in spam filtering based on naive Bayes approach. Experimental results show that our method outperforms traditional method with no feature identification.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!