Abstract
In this paper, for improving classification performance of a term selection based on GA, we modify its evaluation function and mutation operation. In the term selection based on GA, two objectives which are maximizing correctly classified texts and minimizing selected terms are optimized. The weighted sum of these two objectives was used as the evaluation function. Therefore, GA-based term selection is performed aiming at the improvement in classification performance on testing sets. This causes the performance deterioration over completely unseen texts. This is because terms are deleted excessively even when the terms have important role for the classification. First, we use NSGA-II for finding non-dominated solutions. As the result, we can have a set of pareto-optimal solutions. Each individual is evaluated by using SVM with -fold cross validation. In this paper, we also modify the mutation operation. The modified mutation operation uses the statistic information of each term as the mutation probability. From numerical simulation results, we show effectiveness of our modification.