Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Japanese word segmentation by Adaboost using the decision list as the weak learner
HIROYUKI SHINNOU
Author information
JOURNAL FREE ACCESS

2001 Volume 8 Issue 2 Pages 3-18

Details
Abstract
In this paper, we propose the new method of Japanese word segmentation by Adaboost using the decision list as the weak learner. The word segmentation is regarded as the classification problem of judging whether the word boundary exists between two characters or not. By solving the problem by the decision list method, we can conduct Japanese word segmentation. Our method has the advantage not to suffer the unknown word problem because we do not use dictionary information as an attribute of our decision list. Moreover, by taking this approach we can use Adaboost which is actively researched in the machine learning domain recently. Adaboost improves the precision of our decision list. In experiments, we built the decision list through Kyoto University Corpus (about 40K sentences). The precision of this decision list was 97.52%. This values was much higher than the precision of character based tri-gram model, 92.76%. By using Adaboost method, our precision was improved to 98.49%. Furthermore, our word segmentation system was excellent in detecting unknown words.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top