Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

1998Volume 5Issue 1 Pages 1-2
Published: January 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.1

JOURNAL FREE ACCESS

Download PDF (204K)
Reinventing Part-Of-Speech Tagging

Ezra Black, Stephen Eubank, Hideki Kashioka, David Magerman, Jared Sai ...

1998Volume 5Issue 1 Pages 3-23
Published: January 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.3

JOURNAL FREE ACCESS

Show abstractHide abstract

Part-of-speech tagging methodology has succeeded, but on problems that may lack real-world application. Redirection of the field is indicated, toward potentially more useful, but harder and more sophisticated tagging tasks: (1) using much more detailed tagsets (semantically and syntactically); (2) testing performance on treebanks reflecting the huge gamut of domains, etc., characterizing real-world applications; (3) understanding the magnitude of the unknown-word and unknown-tag problems, then overcoming them. Tagging results are presented on two versions of a new, highly variegated treebank, featuring tagsets of 2720 and 443 tags, respectively, and utilizing a dictionaryless, decision-tree tagger.

View full abstract

Download PDF (2059K)
A computational model for politeness of expressions

changes of politeness with word endings addition

TAMOTSU SHIRADO, HITOSHI ISAHARA

1998Volume 5Issue 1 Pages 25-36
Published: January 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.25

JOURNAL FREE ACCESS

Show abstractHide abstract

A computational model for the degree of politeness changes with the addition of word endings in polite expressions is proposed. In this model, two stochastic features are assumed as follows: (1) for each polite expression, the situation where the polite expression is likely to use can be expressed as a probability distribution of politeness value, and (2) for each ending word, the situation corresponding to the most suitable polite expression to which each ending word adds can be expressed as a probability distribution of politeness value. The degree of politeness changes with the addition of word endings is defined by the amount of information obtained from the addition of word endings. The result of the psychological experiments supports the validity of the proposed model.

View full abstract

Download PDF (1058K)
PP Attachment Ambiguity Resolution through Supervised Learning

Jiri Stetina, Makoto Nagao

1998Volume 5Issue 1 Pages 37-57
Published: January 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.37

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper deals with two important ambiguities of natural language: prepositional phrase attachment and word sense ambiguity. We propose a new supervised learning method for PP-attachment based on a semantically tagged corpus. Because any sufficiently big sense-tagged corpus does not exist, we also propose a new unsupervised context based word sense disambiguation algorithm which amends the training corpus for the PP attachment by word sense tags. We present the results of our approach, which not only surpasses any existing method but also draws near human performance.

View full abstract

Download PDF (2063K)
Text Structuring by Composition and Decomposition of Segments

NAOYOSHI TAMURA, KEIJI WADA

1998Volume 5Issue 1 Pages 59-78
Published: January 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.59

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we present a structure model for editorial texts and discuss a text analysis method based on the model. A large amount of digitalized documents flow through the media of the INTERNET, CD-ROMs and so on even for personal surroundings. In order to proceed such documents at high speed, the process should be as “superficial” as possible and any specialized knowledge should be required as little as possible. The structuring in our method relies on the analysis of modalities which appear superficially at the tail of Japanese sentences. We define the text structure model of editorials. As a top-down approach for text analysis, we apply a text segmentation method, in which a text is incrementally divided according to the the evaluation function. As a bottom-up approach, based on the rhetorical relation between two neighboring segments, the segments are composed to one according to the strength of the relation. Our approach emploies only the merits of the two, that is, the leaves of a structure tree are analyzed in a bottom-up manner whereas nodes around the root are decomposed in a top-down manner. For the evaluation, we discuss our method from three points of view: (1) objectively agreements checking between formal paragraphs and the upper part around the root of structure trees, (2) agreements checking of the lower part around leaves of trees between human and our method, and (3) human checking of structures generated by our method.

View full abstract

Download PDF (1640K)
Automatically Extracting Collocations Based on Words Position Information in Corpora

HIROKI ODA, KENJI KITA

1998Volume 5Issue 1 Pages 79-99
Published: January 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.79

JOURNAL FREE ACCESS

Show abstractHide abstract

Collocations, which are cohesive and recurrent word clusters, play an important role in many natural language application systems. In this paper, we present a set of new techniques for automatically identifying or extracting collocations from corpora. These techniques are based on words position information, and produce a wide range of collocations, including continuous or discontinuous collocations. The effectiveness has been confirmed by evaluation experiments using the ADD (ATR Dialogue Database) corpus.

View full abstract

Download PDF (5657K)
Associative Document Search using a Probabilistic Document Clustering

MAKOTO IWAYAMA, TAKENOBU TOKUNAGA

1998Volume 5Issue 1 Pages 101-117
Published: January 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.101

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a hierarchical clustering algorithm called HBC (Hierarchical Bayesian Clustering) for associative document search which is retrieving similar documents to a given query document. A major issue in realizing an associative document search is its efficiency in searching similar documents. A straightforward exhaustive search takes O (N) search time. In this paper we discuss the use of cluster-based search in which a document collection is automatically organized into a binary cluster tree and a query document is then compared with each cluster rather than each document. By searching a cluster tree in the top down direction, search time can be reduced to O (log₂N) on average. However since clustering algorithms adopted in previous cluster-based search frameworks used different similarity measure from that used in top down document searching, search accuraccy for these frameworks was not promissing. HBC, on the other hand, directly seeks the maximum search performance on the given document collection by maximizing the self recall for it. In an experiment using “Gendai yôgo no kisotisiki, ” we verified the advantage of our cluster-based search using HBC over the well known cluster-based search using Ward's method. Also in an experiment using “Wall Street Journal, ” we confirmed that cluster-based search using HBC is more noise tolerant than the exhaustive search.

View full abstract

Download PDF (1664K)
Resolution of Verb Phrase Ellipsis in Japanese Sentences using Surface Expressions and Examples

MASAKI MURATA, MAKOTO NAGAO

1998Volume 5Issue 1 Pages 119-133
Published: January 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.119

JOURNAL FREE ACCESS

Show abstractHide abstract

Verb phrases are sometimes omitted in natural language (ellipsis). It is necessary to resolve the verb phrase ellipses in language understanding, machine translation, and dialogue processing. This paper describes a practical way to resolve verb phrase ellipses by using surface expressions and examples. To make heuristic rules for ellipsis resolution we classified verb phrase ellipses by checking whether the referent of a verb phrase ellipsis appears in the surrounding sentences or not. We experimented with the resolution of verb phrase elipses on a novel and obtained a recall rate of 84% and a precision rate of 82% on test sentences. This indicates that our method is effective.In the case when the referent of a verb phrase ellipsis appeared in the surrounding sentences, the accuracy rate was very high. But, in the case when the referent of a verb phrase ellipsis did not appear in the surrounding sentences, the accuracy rate was not so high. Since the analysis of this phenomena is very difficult, it is valuable to propose a way of solving the problem to a certain extent. When the size of corpus becomes larger and the machine performance becomes greater, the method of using corpus will become effective.

View full abstract

Download PDF (1425K)

Register with J-STAGE for free!