Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Domain Adaptation for Word Sense Disambiguation using k-Nearest Neighbor Algorithm and Topic Model
Hiroyuki ShinnouMinoru Sasaki
Author information
JOURNAL FREE ACCESS

2013 Volume 20 Issue 5 Pages 707-726

Details
Abstract

In this paper, we propose the method of domain adaptation for word sense disambiguation (WSD). This method faces the following problems for WSD. (1) The difference between sense distributions on domains. (2) The sparseness of data caused by changing the domain. In this paper, we discuss and recommend the countermeasure for each problem. We use the k-nearest neighbor algorithm (k-NN) and the topic model for the first and second problems, respectively. In particular, we append topic features developed by the topic model for target domain corpus to to training data in source domain and test data in target domain. Using the extended features of support vector machine (SVM) classifier, we solve WSD. However, when the reliability of decision of the SVM classifier for a test instance is low, we use the decision of the k-NN. In the experiment, we select 17 ambiguous words in both domains, PB (books) and OC (Yahoo! Chie Bukuro) in the balanced corpus of contemporary written Japanese (BCCWJ corpus), which appear 50 times or more in these domains, and conduct the experiment of domain adaptation for WSD using these words to show the effectiveness of our method. In the future, we will apply the proposed method to other domains and examine a way to use the topic model considering the universality of a corpus, and an effective ensemble learning for domain adaptation for WSD.

Content from these authors
© 2013 The Association for Natural Language Processing
Previous article Next article
feedback
Top