Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2009Volume 16Issue 5 Pages 5_1-5_3
Published: 2009
Released on J-STAGE: July 28, 2011

DOIhttps://doi.org/10.5715/jnlp.16.5_1

JOURNAL FREE ACCESS

Download PDF (170K)

Memorial writing

[title in Japanese]

[in Japanese]

2009Volume 16Issue 5 Pages 5_5-5_6
Published: 2009
Released on J-STAGE: July 28, 2011

DOIhttps://doi.org/10.5715/jnlp.16.5_5

JOURNAL FREE ACCESS

Download PDF (129K)

Paper

Language Model Improvement by a Pseudo-Stochastically Segmented Corpus

Shinsuke Mori, Hiroki Oda

2009Volume 16Issue 5 Pages 5_7-5_21
Published: 2009
Released on J-STAGE: July 28, 2011

DOIhttps://doi.org/10.5715/jnlp.16.5_7

JOURNAL FREE ACCESS

Show abstractHide abstract

Language model (LM) building needs a corpus whose sentences are segmented into words. For languages in which the words are not delimited by whitespace, an automatic word segmenter built from a general domain corpus is used. Automatically segmented sentences, however, contain many segmentation errors especially around words and expressions belonging to the target domain. To cope with segmentation errors, the concept of stochastic segmentation has been proposed. In this framework, a corpus is annotated with word boundary probabilities that a word boundary exists between two characters. In this paper, first we propose a method to estimate word boundary probabilities based on an maximum entropy model. Next we propose a method for simulating a stochastically segmented corpus by a segmented corpus and show that the computational cost is drastically reduced without a performance degradation.

View full abstract

Download PDF (372K)
Personal Name Disambiguation in Web Search Results Using a Semi-Supervised Clustering Approach

Kazunari Sugiyama, Manabu Okumura

2009Volume 16Issue 5 Pages 5_23-5_49
Published: 2009
Released on J-STAGE: July 28, 2011

DOIhttps://doi.org/10.5715/jnlp.16.5_23

JOURNAL FREE ACCESS

Show abstractHide abstract

Personal names are often submitted to search engines as query keywords. However, in response to a personal name query, search engines return a long list of search results that contains Web pages about several namesakes. In order to address this problem, most of the previous works that disambiguate personal names in Web search results often employ agglomerative clustering approaches. In contrast, we have adopted a semi-supervised clustering approach to integrate similar documents into a seed document. Our proposed semi-supervised clustering approach is novel in that it controls the fluctuation of the centroid of a cluster.

View full abstract

Download PDF (779K)
On Contribution of Sense Dependencies to Word Sense Disambiguation

Jun Hatori, Yusuke Miyao, Jun’ichi Tsujii

2009Volume 16Issue 5 Pages 5_51-5_77
Published: 2009
Released on J-STAGE: July 28, 2011

DOIhttps://doi.org/10.5715/jnlp.16.5_51

JOURNAL FREE ACCESS

Show abstractHide abstract

Traditionally, many researchers have addressed word sense disambiguation (WSD) as an independent classification problem for each word in a sentence. However, the problem with their approaches is that they disregard the interdependencies of word senses. Additionally, since they construct an individual sense classifier for each word, their method is limited in its applicability to the word senses for which training instances are served. In this paper, we propose a supervised WSD model based on the syntactic dependencies of word senses. In particular, we assume that strong dependencies between the sense of a syntactic head and those of its dependents exist. We describe these dependencies on the tree-structured conditional random fields (T-CRFs), and obtain the most appropriate assignment of senses optimized over the sentence. Furthermore, we incorporate these sense dependencies in combination with various coarse-grained sense tag sets, which are expected to relieve the data sparseness problem, and enable our model to work even for words that do not appear in the training data. In experiments, we display the appropriateness of considering the syntactic dependencies of senses, as well as the improvements by the use of coarse-grained tag sets. The performance of our model is shown to be comparable to those of state-of-the-art WSD systems. We also present an in-depth analysis of the effectiveness of the sense dependency features by showing intuitive examples.

View full abstract

Download PDF (261K)
Acquiring Event Relation Knowledge by Learning Cooccurrence Patterns and Fertilizing Cooccurrence Samples

Shuya Abe, Kentaro Inui, Yuji Matsumoto

2009Volume 16Issue 5 Pages 5_79-5_100
Published: 2009
Released on J-STAGE: July 28, 2011

DOIhttps://doi.org/10.5715/jnlp.16.5_79

JOURNAL FREE ACCESS

Show abstractHide abstract

Aiming at acquiring semantic relations between events from a large corpus, this paper proposes several extensions to a state-of-the-art method originally designed for entity relation extraction. First, expressions of events are defined to specify the class of the acquisition task. Second, the templates of co-occurrence patterns are extended so that they can capture semantic relations between event mentions. Experiments on a Japanese Web corpus show that (a) there are indeed specific co-occurrence patterns useful for event relation acquisition, and (b) For action-effect relation, at least five thousand relation instances are acquired from a 500M-sentence Web corpus with a precision of about 66%.

View full abstract

Download PDF (694K)

Register with J-STAGE for free!