Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 6, Issue 7
Displaying 1-7 of 7 articles from this issue
  • [in Japanese]
    1999 Volume 6 Issue 7 Pages 1-2
    Published: October 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (156K)
  • MASAO UTIYAMA
    1999 Volume 6 Issue 7 Pages 3-28
    Published: October 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a statistical measure for detecting over-segmentations, which are errors in segmentation where a morphological analyzer segments places which should not be segmented, in results of Japanese morphological analysis. Such a measure is useful because we can use detected over-segmentations for creating error correction rules or for removing remaining errors in manually debugged corpora. The measure proposed in this paper is based on the ratio of the probability of a whole string to that of the string being segmented into two parts. Therefore, the value of the measure is high when a given string is rarely segmented into two parts. Consequently, a string rated high by the measure is likely to contain over-segmentations. In the experiments, the measure detected over-segmentations in the results of rulebased morphological analyzers very precisely and it also detected remaining oversegmentations in manually debugged corpora. These results show that the proposed measure is useful for developing high quality Japanese morphological analyzers and for developing/debugging corpora.
    Download PDF (2602K)
  • TAKEHITO UTSURO, SHIGEYUKI NISHIOKAYAMA, MASAKAZU FUJIO, YUJI MATSUMOT ...
    1999 Volume 6 Issue 7 Pages 29-60
    Published: October 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Dependeny analysis of Japanese subordinate clauses is one of the most difficult phases in the syntactic analysis of Japanese long sentences. This paper proposes a corpus-based method of learning preference rules of deciding dependency relation of Japanese subordinate clauses. We utilize morphological cues included in the subordinate clauses and statistically estimate the co-relation of those cues and dependency relation of Japanese subordinate clauses. Especially, we exploit scope embedding preference of subordinate clauses as a useful information source for disambiguating dependencies between subordinate clauses. In the experimental evalution on EDR Japanese parsed corpus, we discover that there exist several morphological cues that are quite effective in deciding dependency relation of Japanese subordinate clauses. We also show that the estimated dependencies of subordinate clauses successfully increase the accuracy of an existing statistical dependency analyzer.
    Download PDF (8720K)
  • MASAKI MURATA, KIYOTAKA UCHIMOTO, QING MA, HITOSHI ISAHARA
    1999 Volume 6 Issue 7 Pages 61-71
    Published: October 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    George A. Miller insisted that human beings have only seven chunks in short term memory plus or minus two. We counted the number of bunsetsus whose modifiees were not recognized in each step when investigating the dependencies from the beginning of Japanese sentences by using the Kyoto University corpus, and we report that the number was roughly lower than nine, the upper bound of seven plus or minus two. We also investigated English sentences, and we got a result similar to Japanese when we assumed that human beings recognize a series of words such as a noun phrase (NP) as a unit. This indicates that if we assume that human beings' cognitive unit in Japanese and English are bunsetsu and NP respectively, we can accept Miller's theory.
    Download PDF (1125K)
  • NAOTO KATOH, NORIYOSHI URATANI
    1999 Volume 6 Issue 7 Pages 73-92
    Published: October 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a new approach to acquiring linguistic knowledge for local context-based summarization. Our summarization method can transform characters, words, and Bunsetsu-phrases to the shorter ones by using linguistic information on some words to be summarized and some words located before and after the summarized words. Our linguistic knowledge for summarization, which is composed of transformation rules and transformation conditions, is automatically acquired from Japanese news corpus. In our corpus, original articles and the human-summarized ones are collected from NHK news text and NHK teletext respectively. The proposed method analyzes original news sentences and the summarized ones by Japanese morphological analyzer, and aligns original words with the summarized words by DP matching based on distances between both of the words. Transformation rules are acquired as the result of the difference. Transformation conditions are extracted as n-gram words located near a transformation rule. We acquired linguistic knowledge from NHK news corpus and obtained a high accuracy rate as a result of a series of experiments to evaluate the linguistic knowledge.
    Download PDF (4175K)
  • HIROKI ODA, SHINSUKE MORI, KENJI KITA
    1999 Volume 6 Issue 7 Pages 93-108
    Published: October 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Word segmentation, which segments an input sentence into words, is the most fundamental process of Japanese language processing. In this paper, we present a new method for Japanese word segmentation based on a character class model. The character class model is more robust than a character-based model because the number of parameters of the character class model is fewer than that of a character-based. model. The measurement for Japanese character clustering is the entropy on a corpus different from the corpus for model estimation and the search method is based on the greedy algorithm. For this reason, this clustering method gives us an optimum character classification without giving the number of classes. As the result of experiments on the ADD (ATR Dialogue Database) corpus, the proposed Japanese word segmenter using the character class model marked a higher accuracy than a character-based model. In particular, the proposed method using a variable-length n-gram class model achieved 96.38% recall and 96.23% precision for open text.
    Download PDF (6054K)
  • KOZO KIKUCHI, YUKIHIRO ITOH
    1999 Volume 6 Issue 7 Pages 109-123
    Published: October 10, 1999
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Improving the accuracy of dependency analysis in long Japanese sentences is a big problem in the natural language analysis.Concerning the form of “‹noun-1›‹adjective›+‹noun-2›”, we found the seven effective dependency rules which are classified into three following patterns in the previous paper.
    ·determined only from preceding or following nouns.
    ·determined from the relation between preceding or following nouns and adjective.
    ·determined from the characteristic of adjective itself.
    In this paper, we research more systematically the dependency of adnominal which contains I-adjective or NA-adjective, based on classification of adjectives. We explain (1) these seven rules are effectively applied to new adjectives by some extension, (2) some other rules are effective to improve the accuracy of dependency.Finally we obtained an accuracy of about 95% by applying our method to determine the dependency of adnominals which included an adjective.
    Download PDF (1403K)
feedback
Top