ラベルなしデータの二段階分類とアンサンブル学習に基づく半教師あり日本語語義曖昧性解消

井上 裁都; 斎藤 博昭

doi:10.5715/jnlp.18.247

Abstract

In this paper, we propose a bootstrapping-like method which eases optimal and empirical parameter selection for Japanese word sense disambiguation. Bootstrapping means, in this paper, semi-supervised learning methods based on the following procedures: (1) train a classifier on labeled examples, (2) use the classifier to select confident unlabeled examples, (3) add them to the labeled examples, (4) repeat steps 1–3. Traditional bootstrapping methods require empirical selection for the parameters including the pool size, the number of the most confident examples and the number of iterations. Our method uses two-stage unlabeled example classification based on heuristics and a supervised method (Maximum Entropy classifier) and combines a series of classifiers along a sequence of varying conditions. This method requires only one parameter and enables parameter robust word sense disambiguation. Experiments compared with the baseline supervised method on the Japanese WSD task of SemEval-2 shows that our method obtained accuracy improvement between 1.8 and 1.56 points.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!