2013 Volume 2013 Issue AM-03 Pages 05-
We try raising the accuracy of multi-class document categorization using graph-based semi-supervised learning (GBSSL). With this end in view, we propose two methods. The first one is a method to construct a similarity graph by employing both surface information and latent information to express similarity between nodes. The second one is a method to select high-quality training data for GBSSL by means of PageRank algorithm. We experimented on Reuters-21578 corpus. We have confirmed that our proposed methods work well for raising the accuracy of multi-class document categorization.