Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 13, Issue 4
Displaying 1-4 of 4 articles from this issue
  • [in Japanese]
    2006 Volume 13 Issue 4 Pages 1-2
    Published: October 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (198K)
  • Hideki Hirakawa
    2006 Volume 13 Issue 4 Pages 3-31
    Published: October 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Preference Dependency Grammar (PDG) is a framework for the morphological, syntactic and semantic analysis of natural language sentences.PDG gives packed shared data structures for encompassing the various ambiguities in each levels of sentence analysis with preference scores and a method for calculating the most plausible interpretation of a sentence.This paper proposes the Graph Branch Algorithm for computing the optimum dependency tree (the most plausible interpretation of a sentence) from a scored dependency forest which is a packed shared data structure encompassing all possible dependency trees (interpretations) of a sentence.The graph branch algorithm adopts the branch and bound principle for managing arbitrary arc co-occurrence constraints including the single valence occupation constraint which is a basic semantic constraint in PDG.This paper also reports the experiment using English texts showing the computational complexity and behavior of the graph branch algorithm.
    Download PDF (4500K)
  • SHINSUKE MORI
    2006 Volume 13 Issue 4 Pages 33-47
    Published: October 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we discuss stochastic language model adaptation methods given a word list and a raw corpus.In this situation, a general method is to segment the raw corpus by a word segmenter equipped with a word list, correct the output sentences annotated with word boundary information by hand, and build a model from the segmented corpus.In this sentence-by-sentence error correction method, however, the annotator encounters difficult points and this results in a decrease of the productivity. In addition, it is not sure that sentence-by-sentence error correction from the beginning is the best way to dispense a limited work force.In this paper, we propose to take a word as a correction unit and concentrically correct the positions in which words in the list appear.This method allows us to avoid the above difficulty and go straight to capture the statistical behavior of specific words in the application field. In the experiments, we compared the language models built by several methods from the corpora in predictive power and Kana-kanji conversion accuracy.The results showed that concentrating on the error correction around the words in the list, we can build a better language model with less effort.
    Download PDF (1564K)
  • KOSUKE TOKUNAGA, JUN'ICHI KAZAMA, KENTARO TORISAWA
    2006 Volume 13 Issue 4 Pages 49-67
    Published: October 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose a method of acquiring attribute words for a wide range of object classes from Japanese Web documents.The method is a simple unsupervised method that ranks candidate words according to the score that uses the statistics of lexicosyntactic patterns, HTML tags, and word occurrences, as clues.To evaluate the attribute words, we also establish an evaluation procedure based on the idea of question-answerability. Using the proposed evaluation procedure, we conducted experiments on 22 word classes with four human evaluators.The results revealed that our method can obtain attribute words with a high degree of precision and the clues used in the ranking actually contribute to the performance.
    Download PDF (2072K)
feedback
Top