Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 5, Issue 1
Displaying 1-8 of 8 articles from this issue
  • [in Japanese]
    1998 Volume 5 Issue 1 Pages 1-2
    Published: January 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (204K)
  • Ezra Black, Stephen Eubank, Hideki Kashioka, David Magerman, Jared Sai ...
    1998 Volume 5 Issue 1 Pages 3-23
    Published: January 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Part-of-speech tagging methodology has succeeded, but on problems that may lack real-world application. Redirection of the field is indicated, toward potentially more useful, but harder and more sophisticated tagging tasks: (1) using much more detailed tagsets (semantically and syntactically); (2) testing performance on treebanks reflecting the huge gamut of domains, etc., characterizing real-world applications; (3) understanding the magnitude of the unknown-word and unknown-tag problems, then overcoming them. Tagging results are presented on two versions of a new, highly variegated treebank, featuring tagsets of 2720 and 443 tags, respectively, and utilizing a dictionaryless, decision-tree tagger.
    Download PDF (2059K)
  • changes of politeness with word endings addition
    TAMOTSU SHIRADO, HITOSHI ISAHARA
    1998 Volume 5 Issue 1 Pages 25-36
    Published: January 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    A computational model for the degree of politeness changes with the addition of word endings in polite expressions is proposed. In this model, two stochastic features are assumed as follows: (1) for each polite expression, the situation where the polite expression is likely to use can be expressed as a probability distribution of politeness value, and (2) for each ending word, the situation corresponding to the most suitable polite expression to which each ending word adds can be expressed as a probability distribution of politeness value. The degree of politeness changes with the addition of word endings is defined by the amount of information obtained from the addition of word endings. The result of the psychological experiments supports the validity of the proposed model.
    Download PDF (1058K)
  • Jiri Stetina, Makoto Nagao
    1998 Volume 5 Issue 1 Pages 37-57
    Published: January 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper deals with two important ambiguities of natural language: prepositional phrase attachment and word sense ambiguity. We propose a new supervised learning method for PP-attachment based on a semantically tagged corpus. Because any sufficiently big sense-tagged corpus does not exist, we also propose a new unsupervised context based word sense disambiguation algorithm which amends the training corpus for the PP attachment by word sense tags. We present the results of our approach, which not only surpasses any existing method but also draws near human performance.
    Download PDF (2063K)
  • NAOYOSHI TAMURA, KEIJI WADA
    1998 Volume 5 Issue 1 Pages 59-78
    Published: January 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we present a structure model for editorial texts and discuss a text analysis method based on the model. A large amount of digitalized documents flow through the media of the INTERNET, CD-ROMs and so on even for personal surroundings. In order to proceed such documents at high speed, the process should be as “superficial” as possible and any specialized knowledge should be required as little as possible. The structuring in our method relies on the analysis of modalities which appear superficially at the tail of Japanese sentences. We define the text structure model of editorials. As a top-down approach for text analysis, we apply a text segmentation method, in which a text is incrementally divided according to the the evaluation function. As a bottom-up approach, based on the rhetorical relation between two neighboring segments, the segments are composed to one according to the strength of the relation. Our approach emploies only the merits of the two, that is, the leaves of a structure tree are analyzed in a bottom-up manner whereas nodes around the root are decomposed in a top-down manner. For the evaluation, we discuss our method from three points of view: (1) objectively agreements checking between formal paragraphs and the upper part around the root of structure trees, (2) agreements checking of the lower part around leaves of trees between human and our method, and (3) human checking of structures generated by our method.
    Download PDF (1640K)
  • HIROKI ODA, KENJI KITA
    1998 Volume 5 Issue 1 Pages 79-99
    Published: January 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Collocations, which are cohesive and recurrent word clusters, play an important role in many natural language application systems. In this paper, we present a set of new techniques for automatically identifying or extracting collocations from corpora. These techniques are based on words position information, and produce a wide range of collocations, including continuous or discontinuous collocations. The effectiveness has been confirmed by evaluation experiments using the ADD (ATR Dialogue Database) corpus.
    Download PDF (5657K)
  • MAKOTO IWAYAMA, TAKENOBU TOKUNAGA
    1998 Volume 5 Issue 1 Pages 101-117
    Published: January 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper presents a hierarchical clustering algorithm called HBC (Hierarchical Bayesian Clustering) for associative document search which is retrieving similar documents to a given query document. A major issue in realizing an associative document search is its efficiency in searching similar documents. A straightforward exhaustive search takes O (N) search time. In this paper we discuss the use of cluster-based search in which a document collection is automatically organized into a binary cluster tree and a query document is then compared with each cluster rather than each document. By searching a cluster tree in the top down direction, search time can be reduced to O (log2N) on average. However since clustering algorithms adopted in previous cluster-based search frameworks used different similarity measure from that used in top down document searching, search accuraccy for these frameworks was not promissing. HBC, on the other hand, directly seeks the maximum search performance on the given document collection by maximizing the self recall for it. In an experiment using “Gendai yôgo no kisotisiki, ” we verified the advantage of our cluster-based search using HBC over the well known cluster-based search using Ward's method. Also in an experiment using “Wall Street Journal, ” we confirmed that cluster-based search using HBC is more noise tolerant than the exhaustive search.
    Download PDF (1664K)
  • MASAKI MURATA, MAKOTO NAGAO
    1998 Volume 5 Issue 1 Pages 119-133
    Published: January 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Verb phrases are sometimes omitted in natural language (ellipsis). It is necessary to resolve the verb phrase ellipses in language understanding, machine translation, and dialogue processing. This paper describes a practical way to resolve verb phrase ellipses by using surface expressions and examples. To make heuristic rules for ellipsis resolution we classified verb phrase ellipses by checking whether the referent of a verb phrase ellipsis appears in the surrounding sentences or not. We experimented with the resolution of verb phrase elipses on a novel and obtained a recall rate of 84% and a precision rate of 82% on test sentences. This indicates that our method is effective.In the case when the referent of a verb phrase ellipsis appeared in the surrounding sentences, the accuracy rate was very high. But, in the case when the referent of a verb phrase ellipsis did not appear in the surrounding sentences, the accuracy rate was not so high. Since the analysis of this phenomena is very difficult, it is valuable to propose a way of solving the problem to a certain extent. When the size of corpus becomes larger and the machine performance becomes greater, the method of using corpus will become effective.
    Download PDF (1425K)
feedback
Top