Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2011 Volume 18 Issue 3 Pages 215-216
Published: 2011
Released on J-STAGE: October 04, 2011

DOIhttps://doi.org/10.5715/jnlp.18.215

JOURNAL FREE ACCESS

Download PDF (135K)

Paper

Construction of Context Models for Word Sense Disambiguation

Bernard Brosseau-Villeneuve, Noriko Kando, Jian-Yun Nie

2011 Volume 18 Issue 3 Pages 217-245
Published: 2011
Released on J-STAGE: October 04, 2011

DOIhttps://doi.org/10.5715/jnlp.18.217

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a study on the use of word context features for Word Sense Disambiguation (WSD). State-of-the-art WSD systems achieve high accuracy by using resources such as dictionaries, taggers, lexical analyzers or topic modeling packages. However, these resources are either too heavy or don’t have sufficient coverage for large-scale tasks such as information retrieval. The use of local context for WSD is common, but the rationale behind the formulation of features is often based on trial and error. We therefore investigate the notion of relatedness of context words to the target word (the word to be disambiguated), and propose an unsupervised method for finding the optimal weights for context words based on their distance to the target word. The key idea behind the method is that the optimal weights should maximize the similarity of two context models constructed from different context samples of the same word. Our experimental results show that the strength of the relation between two words follows approximately a power law. The resulting context models are used in Naïve Bayes classifiers for word sense disambiguation. Our evaluation on Semeval WSD tasks in both English and Japanese show that our method can achieve state-of-the-art effectiveness even though it does not use external tools like most existing methods. The high efficiency makes it possible to use our method in large scale applications such as information retrieval.

View full abstract

Download PDF (240K)
Semi-Supervised Japanese Word Sense Disambiguation Based on Two-Stage Classification of Unlabeled Data and Ensemble Learning

Tatsukuni Inoue, Hiroaki Saito

2011 Volume 18 Issue 3 Pages 247-271
Published: 2011
Released on J-STAGE: October 04, 2011

DOIhttps://doi.org/10.5715/jnlp.18.247

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a bootstrapping-like method which eases optimal and empirical parameter selection for Japanese word sense disambiguation. Bootstrapping means, in this paper, semi-supervised learning methods based on the following procedures: (1) train a classifier on labeled examples, (2) use the classifier to select confident unlabeled examples, (3) add them to the labeled examples, (4) repeat steps 1–3. Traditional bootstrapping methods require empirical selection for the parameters including the pool size, the number of the most confident examples and the number of iterations. Our method uses two-stage unlabeled example classification based on heuristics and a supervised method (Maximum Entropy classifier) and combines a series of classifiers along a sequence of varying conditions. This method requires only one parameter and enables parameter robust word sense disambiguation. Experiments compared with the baseline supervised method on the Japanese WSD task of SemEval-2 shows that our method obtained accuracy improvement between 1.8 and 1.56 points.

View full abstract

Download PDF (491K)
Effectiveness of Automatic Expansion of Training Data for Japanese Word Sense Disambiguation

Sanae Fujita, Kevin Duh, Akinori Fujino, Hirotoshi Taira, Hiroyuki Shi ...

2011 Volume 18 Issue 3 Pages 273-291
Published: 2011
Released on J-STAGE: October 04, 2011

DOIhttps://doi.org/10.5715/jnlp.18.273

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a method to expand training data automatically, using example sentences of a dictionary, other sensebank and unlabelled corpus. We tested on the data of SemEval-2010 Japanese WSD task and achieved 79.5% accuracy in our experiments. Then, we limited the number of used training data and achieved 80.0% accuracy.

View full abstract

Download PDF (490K)
On SemEval-2010 Japanese WSD Task

Manabu Okumura, Kiyoaki Shirai, Kanako Komiya, Hikaru Yokono

2011 Volume 18 Issue 3 Pages 293-307
Published: 2011
Released on J-STAGE: October 04, 2011

DOIhttps://doi.org/10.5715/jnlp.18.293

JOURNAL FREE ACCESS

Show abstractHide abstract

An overview of the SemEval-2 Japanese WSD task is presented. The new characteristics of our task are (1) the task will use the first balanced Japanese sense-tagged corpus, and (2) the task will take into account not only the instances that have a sense in the given set but also the instances that have a sense that cannot be found in the set. It is a lexical sample task, and word senses are defined according to a Japanese dictionary, the Iwanami Kokugo Jiten. This dictionary and a training corpus were distributed to participants. The number of target words was 50, with 22 nouns, 23 verbs, and 5 adjectives. Fifty instances of each target word were provided, consisting of a total of 2,500 instances for the evaluation. Nine systems from four organizations participated in the task.

View full abstract

Download PDF (188K)

Register with J-STAGE for free!