Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Improving SVM Active Learning: An Empirical Study in Japanese Word Segmentation
MANABU SASSANO
Author information
JOURNAL FREE ACCESS

2006 Volume 13 Issue 2 Pages 27-41

Details
Abstract

We explore how active learning with Support Vector Machines works well for a nontrivial task in natural language processing. We use Japanese word segmentation as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimental results show that our technique requires less labeled examples than those with the technique in previous research. To achieve 97.0% accuracy, the proposed technique needs 59.3% of labeled examples that are required when using the previous technique and only 17.4% of labeled examples with random sampling.

Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top