2018 Volume 25 Issue 3 Pages 255-293
In this paper, we investigate a problem existing in Japanese word sense disambiguation (WSD) through a HiraganaKanji conversion task. In choosing words to consider as features, we propose a method that employs word embeddings and pointwise mutual information and evaluate the proposed method. The experimental results suggest that our method is more effective than other methods using word embeddings. We conduct an experiment using SemEval 2010 Japanese WSD Task and our proposed method achieve better accuracy. We also compare the accuracy when changing the amount of training data. We find that the difference in accuracy between the methods becomes small when a very large amount of training data is used. We have confirmed that the method of improving accuracy while using fewer training data is important in WSD because the number of sentences required to obtain high accuracy increases exponentially. We also experiment on the domain of data and confirmed that using datasets for ambiguity matching in each domain is important in improving accuracy.