Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 7, Issue 5
Displaying 1-6 of 6 articles from this issue
  • [in Japanese]
    2000 Volume 7 Issue 5 Pages 1-2
    Published: November 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (227K)
  • KIYOTAKA UCHIMOTO, MASAKI MURATA, SATOSHI SEKINE, HITOSHI ISAHARA
    2000 Volume 7 Issue 5 Pages 3-17
    Published: November 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Dependency structure analysis is one of the basic techniques in Japanese sentence analysis, and the Japanese dependency structure is usually represented by relationships between phrasal units called ‘bunsetsu.’ This analysis is a two-step procedure, and the first step is to prepare a dependency matrix in which each element represents how likely it is that one bunsetsu depends on another. The second step of the analysis finds an optimal set of dependencies for the entire sentence. In this paper we discuss a model used in the first step, a model for estimating dependency likelihood. There are two approaches to estimating the dependency likelihood: rule-based, and statistical.We take the statistical approach because electrically available corpora are getting large, and changing hand-crafted rules is costly. In our approach the value of each element in a dependency matrix is an estimated probability.A statistical model (here called the “old model”) that considers only the relationship between two bunsetsus when estimating those probabilities was earlier proposed. In this paper we propose a new model that estimates dependency likelihood by considering not only the relationship between two bunsetsus but also the relationship between the left bunsetsu and all of the bunsetsus to its right. Our implementation of this model is based on the ME (maximum entropy) model. When tested with the Kyoto University corpus the dependency accuracy obtained with this model was 88%, which is about 1% higher than that obtained with the old model even using exactly the same features.
    Download PDF (3112K)
  • Refinement and Extension for Implementation
    AKIRA OHTANI, TAKASHI MIYATA, YUJI MATSUMOTO
    2000 Volume 7 Issue 5 Pages 19-49
    Published: November 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    A parser based on declarative grammar that deals with various aspects of language is indispensable for natural language processing. For constructing a practical grammar system, we develop unification-based Japanese phrase structure grammar, NAIST JPSG, which is an implementation of ideas from recent developments in Head-driven Phrase Structure Grammar. The principles, schemata and features are designed through considering various aspects of Japanese and describing regularities among them as a set of local constraints. We then devote our discussion to the analysis of language-specific phenomena, the distribution of case particles, the thematic locality of sa-hen dô-si constructions, and the modification of case-marked adnominal phrase in rental syû-syoku clauses, with main focus on their specific lexical information.(i) Whether case particles can appear or not is accounted for under the type-hierarchical case feature, which is a part of the feature system for describing linguistic objects.(ii) Sa-hen dô-si constructions include simple thematic relation in spite of their morphologically complex status. Lexical description and general mechanism as unification can reconcile such mismatch.(iii) Through the consultation of corpus, some classes of ambiguities on modifier-modifiee relation in rental syû-syoku clause can be reduced by introducing a predicative morpheme.
    Download PDF (2908K)
  • YOSHIYUKI UMEMURA, YOSHIHISA HARATA, TSUKASA SHIMIZU, GUNJI SUGIMOTO
    2000 Volume 7 Issue 5 Pages 51-70
    Published: November 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We developed a method which used decision lists for partial dependency analysis which was important for a pause control of a speech synthesis. As the result of its evaluation using F measure as the accuracy of pause insertions, it was 90.04%.This method based on decision lists has a good point about flexibility adjusting the memory usage and the processing speed.In the case of small 12K Bytes memory and high speed, the processing time to set pauses of a sentence is 7msec (Pentium III 450MHz) and F measure is 85%. These results indicate that this method is applicable for the actual speech synthesis systems. We made rules for pause insertions and confirmed the performance by subjective evaluations. As the result, the appropriate rate was about 85% by the control derived only by dependency distances and the appropriate rate was about 91% by the control derived additional factors such as punctuation marks and succession of pauses.
    Download PDF (1712K)
  • HIROSHI KANAYAMA, KENTARO TORISAWA, YUTAKA MITSUISHI, JUN'ICHI TSUJII
    2000 Volume 7 Issue 5 Pages 71-91
    Published: November 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes a statistical method for Japanese dependency analysis. The method differs from conventional statistical models in the way of calculating statistical values. The conventional models calculate the probability of a correct dependency between two bunsetsus (phrasal units of Japanese) for each pair of bunsetsus. On the other hand, we propose the triplet/quadruplet model, in which the conditional part of the probability consists of information on a modifier bunsetsu and all its modification candidates, and the probability that a candidate is chosen as the modifiee is calculated. The number of candidates is restricted to three or less by an HPSG-based grammar and heuristics. With a maximum entropy estimation, our parser achieves high accuracy: 88.6% for the analysis of the EDR annotated corpus.
    Download PDF (2016K)
  • KIYOAKI SHIRAI, MASAHIRO UEKI, TAIICHI HASHIMOTO, TAKENOBU TOKUNAGA, H ...
    2000 Volume 7 Issue 5 Pages 93-112
    Published: November 10, 2000
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we describe a tool kit for natural language analysis, the MSLR parser tool kit.The ‘MSLR parser’ is based on the generalized LR parsing algorithm, and integrates morphological and syntactic analysis of unsegmented sentences. The'LR table generator'constructs an LR table from a context free grammar and a connection matrix describing adjacency constraints between part-of-speech pairs. By incorporating connection matrix-based constraints into the LR table, it is possible to both reject any locally implausible parsing results, and reduce the size of the LR table. Then, using the generated LR table and a lexicon, the MSLR parser outputs parse trees based on morphological and syntactic analysis of input sentences. In addition to this, the MSLR parser accepts sentence inputs including partial syntactic constraints denoted by pairs of brackets, and suppresses the generation of any parse trees not satisfying those constraints. Furthermore, it can be trained according to the probabilistic generalized LR (PGLR) model, which is a mildly context sensitive language model. It can also rank parse trees in order of the overall probability returned by the trained PGLR model.
    Download PDF (1922K)
feedback
Top