Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 4, Issue 2
Displaying 1-8 of 8 articles from this issue
  • [in Japanese]
    1997 Volume 4 Issue 2 Pages 1-2
    Published: April 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (171K)
  • YUJIE ZHANG, KAZUHIKO OZEKI
    1997 Volume 4 Issue 2 Pages 3-19
    Published: April 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    It is well known that the frequency of phrase-pairs in modifier-modified relation depends on the distance between the phrases constituting the pair in the Japanese language.That is, a phrase in a sentence modifies its immediate successor most frequently, and the frequency of modification decreases as the distance between the modifier phrase and the modified phrase increases unless the modified phrase is the last one in the sentence.This paper discusses a method of exploiting this statistical knowledge for dependency analysis of Japanese sentences.The minimum total penalty method was used for dependency analysis in this work. The method requires a penalty function, which specifies the association strength between phrases.Several penalty functions were defined based on the frequency distribution of dependency distance extracted from ATR 503-sentence corpus, and the analysis performances were compared.Another experiment was conducted by using a deterministic dependency analysis method for comparison.It is concluded that the knowledge of the dependency distance distribution is effective, and that detailed knowledge of the dependency distance distribution extracted for each modifier phrase group is still more effective for improving the analysis performance.
    Download PDF (1544K)
  • FUMIYO FUKUMOTO, JUN'ICHI TSUJII
    1997 Volume 4 Issue 2 Pages 21-39
    Published: April 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we focus on a definition of polysemy in terms of distributional behaviour of words in monolingual texts and propose a method for disambiguating word-senses in sentences containing occurrences of polysemous verbs. We first discuss existing work on some corpus-related approaches on word-sense disambiguation and show the significance of our approach by comparing it with other related work. Then we give a definition of polysemy from the viewpoint of clustering and propose a clustering method which automatically recognises polysemous words. Finally the information extracted by the clustering method is shown to contribute to disambiguating word-senses in sentences containing occurrences of polysemous verbs. We report the results of two experiments. The first experiment, Disambiguation Experiment, is conducted in order to see how the extracted polysemy information can be used to disambiguate word-senses in actual texts. The second, Comparative Experiment, is conducted in order to see how our disambiguation technique is effective than other related approach, Niwa's technique. The results of experiments demonstrate the applicability of our proposed method.
    Download PDF (1706K)
  • MASAKI MURATA, MAKOTO NAGAO
    1997 Volume 4 Issue 2 Pages 41-56
    Published: April 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    A definite noun phrase can indirectly refer to an entity that has already been mentioned before. For example, “There is a house.The roof is white.” indicates that “the roof” is associated with “a house”, which was mentioned in a previous sentense. This kind of references (indirect anaphora) has not been studied well in natural language processing, but is important for coherence resolution, language understanding, and machine translation. When we analyze indirect anaphora, we need a case frame dictionary for nouns containing a knowledge about relations between two nouns. But no noun case frame dictionary exists at present. Therefore, we are forced to use examples of “A of B” and a verb case frame dictionary, instead. We experimented the estimation of indirect anaphoras by using this information, and obtained a recall rate of 63% and a precision rate of 68% on held-out test sentences. This indicates that the information of‘A of B’ is useful to a certain extent when we can not make use of a noun case frame dictionary. We made an estimation in the case that we can use a good noun case frame dictionary, and obtained the result with the recall and the precision rates of 71% and 82%, respectively. Finally we proposed how to construct a noun case frame dictionary from examples of “A of B”.
    Download PDF (1693K)
  • Masato Shiraishi, Masao Yokota
    1997 Volume 4 Issue 2 Pages 57-70
    Published: April 10, 1997
    Released on J-STAGE: June 07, 2011
    JOURNAL FREE ACCESS
    Toward the realization of a natural language understanding system for clinical records, the authors have analyzed a large number of discharge summaries (a kind of clinical record). In the records many Japanese compound nouns appear due to ellipsis. Therefore, it is very essential to the understanding system to cope with them. This paper describes a system to paraphrase compound nouns by restoring their elliptical constructions in use of their semantic categorie categories (Yokota, Nishimura, Shiraishi and Ryu 1994) according to the Mental-image directed semantic theory (Yokota 1988; Yokota, Shiraishi, Ryu, and Oda 1991b).This system consists of four major processors: “Word segmentation processor, ” “Restoration processor, ” “Hierarchical relation detector” and “Sentence generator”, and possesses two types of dictionary: “Word dictionary” and “Hierarchy dictionarv”. The fbrmer of the dictionaries assigns a semantic category, etc. to each noun, and the latter contains the hierarchic relations among the concepts of objects (one of the semantic categories of nouns). The experimental result of the system has proven to be fairly successful.
    Download PDF (1204K)
  • Hang Li, Naoki Abe
    1997 Volume 4 Issue 2 Pages 71-88
    Published: April 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We address the problem of automatically constructing a thesaurus (hierarchically clustering words) based on corpus data. We view the problem of clustering words as that of estimating a joint distribution over the Cartesian product of a partition of a set of nouns and a partition of a set of verbs, and propose an estimation algorithm using simulated annealing with an energy function based on the Minimum Description Length (MDL) Principle. We empirically compared the performance of our method based on the MDL Principle against a method based on the Maximum Likelihood Estimator, and found that the former outperforms the latter. We also evaluated the method by conducting pp-attachment disambiguation experiments using an automatically constructed thesaurus. Our experimental results indicate that we can improve accuracy in disambiguation by using such a thesaurus.
    Download PDF (1507K)
  • FUMIYO FUKUMOTO, JUN'ICHI FUKUMOTO, YOSHIMI SUZUKI
    1997 Volume 4 Issue 2 Pages 89-109
    Published: April 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we propose a method for extracting key paragraph in articles based on the degree of context dependency. Like Luhn's technique, our method assumes that the words which are relative to theme in an article appear throughout paragraphs. Our technique for extraction of keywords is based on the degree of context dependency that how every word is strongly related to a given context. The results of experiments demonstrate the applicability of our proposed method.
    Download PDF (2007K)
  • Atsushi Fujii, Kentaro Inui, Takenobu Tokunaga, Hozumi Tanaka
    1997 Volume 4 Issue 2 Pages 111-123
    Published: April 10, 1997
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Word sense disambiguation has recently been utilized in corpus-based approaches, reflecting the growth in the number of machine readable texts.One category of approaches disambiguates an input verb sense based on the similarity between its governing case fillers and those in given examples. In this paper, we introduce the degree of case contribution to verb sense disambiguation into this existing method. In this, greater diversity of semantic range of case filler examples will lead to that case contributing to verb sense disambiguation more. We also report the result of a comparative experiment, in which the performance of disambiguation is improved by considering this notion of semantic contribution.
    Download PDF (1221K)
feedback
Top