Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

2006Volume 13Issue 4 Pages 1-2
Published: October 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.4_1

JOURNAL FREE ACCESS

Download PDF (198K)
Graph Branch Algorithm: An Optimum Tree Search Method for Scored Dependency Graph with Arc Co-occurrence Constraints

Hideki Hirakawa

2006Volume 13Issue 4 Pages 3-31
Published: October 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.4_3

JOURNAL FREE ACCESS

Show abstractHide abstract

Preference Dependency Grammar (PDG) is a framework for the morphological, syntactic and semantic analysis of natural language sentences.PDG gives packed shared data structures for encompassing the various ambiguities in each levels of sentence analysis with preference scores and a method for calculating the most plausible interpretation of a sentence.This paper proposes the Graph Branch Algorithm for computing the optimum dependency tree (the most plausible interpretation of a sentence) from a scored dependency forest which is a packed shared data structure encompassing all possible dependency trees (interpretations) of a sentence.The graph branch algorithm adopts the branch and bound principle for managing arbitrary arc co-occurrence constraints including the single valence occupation constraint which is a basic semantic constraint in PDG.This paper also reports the experiment using English texts showing the computational complexity and behavior of the graph branch algorithm.

View full abstract

Download PDF (4500K)
Language Model Adaptation with a Word List and a Raw Corpus

SHINSUKE MORI

2006Volume 13Issue 4 Pages 33-47
Published: October 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.4_33

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we discuss stochastic language model adaptation methods given a word list and a raw corpus.In this situation, a general method is to segment the raw corpus by a word segmenter equipped with a word list, correct the output sentences annotated with word boundary information by hand, and build a model from the segmented corpus.In this sentence-by-sentence error correction method, however, the annotator encounters difficult points and this results in a decrease of the productivity. In addition, it is not sure that sentence-by-sentence error correction from the beginning is the best way to dispense a limited work force.In this paper, we propose to take a word as a correction unit and concentrically correct the positions in which words in the list appear.This method allows us to avoid the above difficulty and go straight to capture the statistical behavior of specific words in the application field. In the experiments, we compared the language models built by several methods from the corpora in predictive power and Kana-kanji conversion accuracy.The results showed that concentrating on the error correction around the words in the list, we can build a better language model with less effort.

View full abstract

Download PDF (1564K)
Automatic Discovery of Attribute Words from Web Documents and Criteria for Human Evaluation

KOSUKE TOKUNAGA, JUN'ICHI KAZAMA, KENTARO TORISAWA

2006Volume 13Issue 4 Pages 49-67
Published: October 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.4_49

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a method of acquiring attribute words for a wide range of object classes from Japanese Web documents.The method is a simple unsupervised method that ranks candidate words according to the score that uses the statistics of lexicosyntactic patterns, HTML tags, and word occurrences, as clues.To evaluate the attribute words, we also establish an evaluation procedure based on the idea of question-answerability. Using the proposed evaluation procedure, we conducted experiments on 22 word classes with four human evaluators.The results revealed that our method can obtain attribute words with a high degree of precision and the clues used in the ranking actually contribute to the performance.

View full abstract

Download PDF (2072K)

Register with J-STAGE for free!