Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 16, Issue 5
Displaying 1-6 of 6 articles from this issue
Preface
Memorial writing
Paper
  • Shinsuke Mori, Hiroki Oda
    2009 Volume 16 Issue 5 Pages 5_7-5_21
    Published: 2009
    Released on J-STAGE: July 28, 2011
    JOURNAL FREE ACCESS
    Language model (LM) building needs a corpus whose sentences are segmented into words. For languages in which the words are not delimited by whitespace, an automatic word segmenter built from a general domain corpus is used. Automatically segmented sentences, however, contain many segmentation errors especially around words and expressions belonging to the target domain. To cope with segmentation errors, the concept of stochastic segmentation has been proposed. In this framework, a corpus is annotated with word boundary probabilities that a word boundary exists between two characters. In this paper, first we propose a method to estimate word boundary probabilities based on an maximum entropy model. Next we propose a method for simulating a stochastically segmented corpus by a segmented corpus and show that the computational cost is drastically reduced without a performance degradation.
    Download PDF (372K)
  • Kazunari Sugiyama, Manabu Okumura
    2009 Volume 16 Issue 5 Pages 5_23-5_49
    Published: 2009
    Released on J-STAGE: July 28, 2011
    JOURNAL FREE ACCESS
    Personal names are often submitted to search engines as query keywords. However, in response to a personal name query, search engines return a long list of search results that contains Web pages about several namesakes. In order to address this problem, most of the previous works that disambiguate personal names in Web search results often employ agglomerative clustering approaches. In contrast, we have adopted a semi-supervised clustering approach to integrate similar documents into a seed document. Our proposed semi-supervised clustering approach is novel in that it controls the fluctuation of the centroid of a cluster.
    Download PDF (779K)
  • Jun Hatori, Yusuke Miyao, Jun’ichi Tsujii
    2009 Volume 16 Issue 5 Pages 5_51-5_77
    Published: 2009
    Released on J-STAGE: July 28, 2011
    JOURNAL FREE ACCESS
    Traditionally, many researchers have addressed word sense disambiguation (WSD) as an independent classification problem for each word in a sentence. However, the problem with their approaches is that they disregard the interdependencies of word senses. Additionally, since they construct an individual sense classifier for each word, their method is limited in its applicability to the word senses for which training instances are served. In this paper, we propose a supervised WSD model based on the syntactic dependencies of word senses. In particular, we assume that strong dependencies between the sense of a syntactic head and those of its dependents exist. We describe these dependencies on the tree-structured conditional random fields (T-CRFs), and obtain the most appropriate assignment of senses optimized over the sentence. Furthermore, we incorporate these sense dependencies in combination with various coarse-grained sense tag sets, which are expected to relieve the data sparseness problem, and enable our model to work even for words that do not appear in the training data. In experiments, we display the appropriateness of considering the syntactic dependencies of senses, as well as the improvements by the use of coarse-grained tag sets. The performance of our model is shown to be comparable to those of state-of-the-art WSD systems. We also present an in-depth analysis of the effectiveness of the sense dependency features by showing intuitive examples.
    Download PDF (261K)
  • Shuya Abe, Kentaro Inui, Yuji Matsumoto
    2009 Volume 16 Issue 5 Pages 5_79-5_100
    Published: 2009
    Released on J-STAGE: July 28, 2011
    JOURNAL FREE ACCESS
    Aiming at acquiring semantic relations between events from a large corpus, this paper proposes several extensions to a state-of-the-art method originally designed for entity relation extraction. First, expressions of events are defined to specify the class of the acquisition task. Second, the templates of co-occurrence patterns are extended so that they can capture semantic relations between event mentions. Experiments on a Japanese Web corpus show that (a) there are indeed specific co-occurrence patterns useful for event relation acquisition, and (b) For action-effect relation, at least five thousand relation instances are acquired from a 500M-sentence Web corpus with a precision of about 66%.
    Download PDF (694K)
feedback
Top