Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Application of unsupervised learning using EM algorithm to Japanese Translation Task
HIROYUKI SHINNOU
Author information
JOURNAL FREE ACCESS

2003 Volume 10 Issue 3 Pages 61-73

Details
Abstract
In this paper, we apply an unsupervised learning method using the EM algorithm which Nigam et al. have proposed for text classification, to disambiguation problems involving noun meanings taken up in Japanese Translation Task of SENSEVAL2. This method uses the EM algorithm, setting up hidden labels of unlabeled data as missing values of observational data, the Naive Bayes model as the generating model, and the conditional probabilities p (f|c) (where f is a feature and c is a label) as parameters of the model. As the result, the learned classifier is improved. In this study, we use only simple features for the classification, which are some words surrounding a target word. In the experiments, the precision of Naive Bayes classifier learned through only labeled data was 58.2%. The precision of the decision list learned through the same data was 58.9%, which is the Ibaraki record in the Translation Task contest. Our unsupervised learning method improved the precision to 61.8% by using unlabeled data in addition to labeled data. Furthermore, by revising a small part of labeled data, the precision levels of the Naive Bayes classifier and our unsupervised learning method were improved to 62.3% and 68.2% respectively.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top