Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

1999 Volume 6 Issue 5 Pages 1-2
Published: July 10, 1999
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.6.5_1

JOURNAL FREE ACCESS

Download PDF (193K)
A Method to Detect Self-repair Syllable Strings in Spontaneous Speech using Markov Model

TETSUO ARAKI, SATORU IKEHARA, MASATO HASHIMOTO

1999 Volume 6 Issue 5 Pages 3-26
Published: July 10, 1999
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.6.5_3

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method to detect self-repair strings included in spontaneous speech by Markov models of syllables. These strings are assumed to be represented with syllable strings obtained correctly by acoustic processing. The method comprises the following two steps: The first step is to determine the provisional bunsetsu boundaries of a non-segmented syllable sentence with self-repair strings. We improved the method which has been proposed to find the provisional bunsetsu boundaries of correct sentences by Markov models, to be applicable to sentences with self-repair. The second step is to detect self-repair strings, which are inserted in the location of bunsetsu boundaries. In this step, we proposed three methods of pattern matching to detect these strings. This method is applied to detect self-repair strings in ATR dialogue corpus. It is confirmed that the method is effective to detect self-repair strings inserted in bunsetsu boundaries.

View full abstract

Download PDF (3583K)
A Method for Finding Salient Features in Metaphor Understanding

YUTAKA IMAI, SHUN ISHIZAKI

1999 Volume 6 Issue 5 Pages 27-42
Published: July 10, 1999
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.6.5_27

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a method which finds salient features automatically in a metaphorical expression consisting of two noun concepts. First, we prepared a bundle of features by human association experiments on the concepts, and, using the bundle, we implemented SD (Semantic Differential) Method experiments to evaluate the features. Then, we extracted common salient features by using a new neural network mechanism where the result of the SD Method experiments were used for the parameters of the mechanism. Since this mechanism can be applied to any pair of concepts to form a sentence “T is V”, saliency of features which are common to the T and V is evaluated quantitatively. We show examples calculated by the system to verify its effectiveness.

View full abstract

Download PDF (1408K)
Towards Multi-paper Summarization Using Reference Information

HIDETSUGU NANBA, MANABU OKUMURA

1999 Volume 6 Issue 5 Pages 43-62
Published: July 10, 1999
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.6.5_43

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we present a system to support writing a survey of the specific domain. In this system, we use reference information. Reference information includes the reference relationships between papers and the information which can be derived from the description around the citation, and be useful for understanding the difference between the referring and referred papers. To write a survey, at least two processes are necessary. One is to collect papers of some domain. Another is to make clear the differences between papers. We think the reference information is useful for these two processes. Firstly, we try to extract a fragment of texts where the author describes about a referring paper. We call the fragment “Reference Area”. Secondly, we attempt to analyze the purpose of reference. We divide that into three categories (we call these categories “Reference Types”), and develop the method to determine the type by using cue words, As a result, we got the recall of 79.6% and the precision of 76.3% in reference area extraction, and the accuracy of 83% in reference type decision. Making use of these reference types, we can collect a set of papers in the same domain. Finally, we build up a system to display the reference graph of the papers. With the system, abstracts and reference areas of papers can be seen. Users of this system can easily collect papers of some specific domain, and also can understand the differences between the related papers.

View full abstract

Download PDF (4924K)
Using Constituent Boundary Parsing for Multi-lingual Spoken-language Translation

OSAMU FURUSE, KAZUHIDE YAMAMOTO, SETSUO YAMADA

1999 Volume 6 Issue 5 Pages 63-91
Published: July 10, 1999
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.6.5_63

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a method called constituent boundary parsing which uses pattern matching on the surface form. The new version of Transfer-Driven Machine Translation (TDMT) combining constituent boundary parsing and example-based processing is effective for multi-lingual spoken-language translation. Constituent boundary parsing consistently describes the syntactic structures of various expressions with surface patterns consisting of variables and constituent boundaries. In constituent boundary parsing, input words are read in a left-to-right fashion, and the best syntactic structure is efficiently built up based on a chart-parsing algorithm while disambiguating local structures. By introducing constituent boundary parsing, the problems of the earlier version of TDMT, such as the descriptive power of syntactic structures and the explosion of structural ambiguity are solved. Also, because constituent boundary parsing and example-based processing are simple and languageindependent, TDMT's applicability to multi-lingual spoken-language translation has been enhanced. We have evaluated the TDMT system which translates bilingually between Japanese and English, and Japanese and Korean in the domain of travel conversations. Experimental results show that a wide range of sentences in the domain can be translated into understandable output in real-time by the proposed TDMT.

View full abstract

Download PDF (2630K)
An Efficient Way of Gauging Similarity between Long Japanese News Expressions

HIDEKI TANAKA, TADASHI KUMANO, NORIYOSHI URATANI, TERUMASA EHARA

1999 Volume 6 Issue 5 Pages 93-116
Published: July 10, 1999
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.6.5_93

JOURNAL FREE ACCESS

Show abstractHide abstract

We are developing a Japanese-to-English Translation Aid system for news translators. The system consists of a voluminous bilingual news database whose sentences are properly aligned across languages beforehand, and a similar expression search engine. A user can find past translation examples of input Japanese with the system. Similar expression search engines like the one in this paper have usually employed an AND retrieval technique that uses keywords in the input expression, to measure the similarity between the input and the target by the number of shared keywords. In many cases of applying such search engines to our database, however, a number of spurious search results have been produced as a consequence: the sentences have been quite long (88.9 Japanese characters on average) and a single sentence has often contained identical keywords many times. In this paper, we propose adding two constraints to the AND retrieval technique: the order and positions (deviations) of keywords. We enhance AND retrieval allowing it to be able to reflect some syntactic similarity by this inexpensive modification. We will show, through a set of experiments, that the proposed method significantly improves the level of user satisfaction in search results in a statistical sense, with only a 1.3-fold increase in the search time.

View full abstract

Download PDF (2427K)

Register with J-STAGE for free!