Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

1998 Volume 5 Issue 4 Pages 1-2
Published: October 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_1

JOURNAL FREE ACCESS

Download PDF (226K)
A Hybrid Approach for Resolving Ambiguities in Coordinate Structures

Haodong Wu, Teiji Furugori

1998 Volume 5 Issue 4 Pages 3-16
Published: October 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_3

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes a method in determining syntactic structure for coordinate constructions. It is based on the information taken from semantic similarities, selectional restrictions, and some other linguistic cues. We discuss the role the information plays in resolving ambiguities that appear in coordinate constructions, describe the means of acquiring the necessary information automatically from two on-line corpora and a lexical database, and devise two algorithms for disambiguating coordinate constructions. An experiment that follows shows effectiveness of our method and its applicability to resolving ambiguities in some other syntactic structures.

View full abstract

Download PDF (1206K)
The Application of Classification Trees to Bunsetsu Segmentation of Japanese Sentences

Yujie Zhang, Kazuhiko Ozeki

1998 Volume 5 Issue 4 Pages 17-33
Published: October 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_17

JOURNAL FREE ACCESS

Show abstractHide abstract

In conventional bunsetsu segmentation methods for Japanese sentences, segmentation rules have been given manually. This causes difficulties in maintaining the consistency of the rules, and in deciding an efficient order of rule application. This paper proposes a method of automatic bunsetsu segmentation using a classification tree, by which knowledge about bunsetsu boundaries is automatically acquired from a corpus, and an efficient order of rule application is realized automatically. It can adapt quickly to a new system of parts of speech, and also to a new task domain without the need for changing the algorithm. Generation of classification trees for bunsetsu segmentation and evaluation experiments were carried out on an ATR corpus and an EDR corpus. The segmentation accuracy of 98.9% was achieved for the ATR corpus, and 96.2% for the EDR corpus. The method was compared with a simple rule-based method and the Bayes decision rule on the ATR corpus. The proposed method outperformed the rule-based method when the training data size was larger than about 20 sentences, and outperformed the Bayes decision rule over the whole range of training data sizes. The superiority of the proposed method was more evident over the former when the training data size was larger, and over the latter when the training data size was smaller.

View full abstract

Download PDF (1521K)
Analysis of Japanese Compound Nouns by Direct Text Scanning

TORU HISAMITSU, YOSHIHIKO NITTA

1998 Volume 5 Issue 4 Pages 35-60
Published: October 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_35

JOURNAL FREE ACCESS

Show abstractHide abstract

Compound nouns tend to be important words because a compound noun conveys a lot of information which can even summarize a document. Therefore the analysis of compound nouns can contribute to machine translation, information extraction, or information retrieval. Since compound nouns lack syntactic clues, existing methods have utilized manually written rules and thesauri in order to analyze word dependency structure in compound nouns. Consequently the methods lack robustness in treating open corpora such as newspaper articles which contain a number of unregistered words. This paper presents a thesaurus-free corpus-based approach which scans a corpus with a set of templates and extracts co-occurrence data of the nouns which construct the compound noun. Unregistered words such as abbreviations and short compound nouns are detected in the process of template-matching and the co-occurrence data of the newly found words are additionally extracted, which leads to the robustness and high accuracy of the analysis. The accuracy of the methodwas evaluated using 400 compound nouns of length 5, 6, 7, and 8. The numbers of the correct analysis were 90, 86, 84, and 84 in 100 compound nouns of length 5, 6, 7, and 8 respectively.

View full abstract

Download PDF (2610K)
Symmetric Pattern Matching Analysis for English Coordinate Structures

Akitoshi Okumura, Kazunori Muraki

1998 Volume 5 Issue 4 Pages 61-76
Published: October 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_61

JOURNAL FREE ACCESS

Show abstractHide abstract

The authors propose a model for analyzing English sentences including coordinate conjunctions such as “and”, “or”, “but” and equivalent words. The syntactic analysis of English coordinate sentences is one of the most difficult problems in machine translation (MT) systems. The problem is selecting, from all possible candidates, the correct syntactic structure formed by an individual coordinate conjunction, i. e. determining which constituents are coordinated by the conjunction. Typically, so many possible structures are produced that MT systems cannot select the correct one, even if the grammars allow us to write the rules in simple notations. This paper presents an English coordinate structure analysis model, which provides top-down scope information on the correct syntactic structure by taking advantage of the symmetric patterns of parallelism. The model is based on a balance-matching operation for two lists of feature sets. It has four effects, namely: a reduction in analysis costs, a decrease in word disambiguation, the interpretation of ellipses, and robust analysis. This model was practically implemented and incorporated into the English-Japanese MT system, and it had about 70%accuracy for 3215Wall Street Journal sentences.

View full abstract

Download PDF (1439K)
Japanese-English Cross Language Information Retrieval based on Comparable Corpora and Bilingual Dictionary

AKITOSHI OKUMURA, KAI ISHIKAWA, KENJI SATOH

1998 Volume 5 Issue 4 Pages 77-93
Published: October 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_77

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method to translate query terms for cross-language information retrieval (CLIR). CLIR is generally performed by query translation and information retrieval (IR). CLIR is less precise than IR because of query term translation ambiguities, especially in Japanese and English CLIR. We developed Double MAXimize criteria based on comparable corpora (DMAX), which is an equivalent translation selection method for machine translation (MT), by using term co-occurrence frequency in comparable corpora. Though a term should be translated into one word for MT, a query term should be translated into several appropriate terms for CLIR. This paper describes a generalized query term selection model, the GDMAX for CLIR. In this model, a source query is represented in the vector form of the term co-occurrence frequency in source corpora. Translation queries are searched by vector similarity calculation between a source query and a target query represented by the co-occurrence frequency in comparable target corpora. GDMAX was evaluated by using TREC6 (Text Retrieval Conference) English data and 15 Japanese queries. GDMAX queries had approximately 62% accuracy of human queries, and 6% higher accuracy than machine translation queries and 12% higher accuracy than bilingual dictionary-based aueries.

View full abstract

Download PDF (3303K)
Japanese Sentence Segmentation System for Translating Japanese into Braille

EMIKO SUZUKI, SATOSHI ONO, HITOSHI KANOH

1998 Volume 5 Issue 4 Pages 95-110
Published: October 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_95

JOURNAL FREE ACCESS

Show abstractHide abstract

Many Japanese sentence segmentation algorithms have been proposed to translate Japanese into English or to query databases. Those methods use a huge dictionary including word representation, readings, and grammar references which require considerable time and work. Since Braille needs only blanks and phonetic information, we do not have to check grammatical combination of words. We propose a new system to segment the Japanese sentence in order to translate Japanese into Braille. Our methods uses a knowledge base which categorizes Japanese sentence segmentation rules. Segmentation rules for translating into Braille are heuristic, ambiguous and complicated. Software is available but the user interface is not very good and volunteers rarely use it. So we provide a user interface for checking the position of ambiguous segmentation. In this way, the users'workload is reduced since it is no longer necessary to check all parts of the sentences. In our method, only a few small tables including words with the segmentation patterns are necessary. Our knowledge base does not need any grammatical information, but utilizes surface information such as Kanji, Hiragana, Katakana, and other character types. The accuracy of segmentaion is 98.0%-a higher rate than that found in usual methods.

View full abstract

Download PDF (4921K)
Correct part extraction from speech recognition results using semantic dictance calcuation and speech translation by translation of extracted parts only

YUMI WAKITA, [in Japanese], HITOSHI IIDA

1998 Volume 5 Issue 4 Pages 111-125
Published: October 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_111

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method for extracting the correct parts from speech recognition results by using an example-based approach for parsing those results that include several recognition errors. Correct parts are extracted using two factors: (1) the semantic distance between the input expression and example expression, and (2) the structure selected by the shortest semantic distance. The examination results showed that the proposed method is able to efficiently extract the correct parts from speech recognition results. About ninety-six percent of the extracted parts are correct. The results also showed that the proposed method is effective in understanding misrecognition speech sentences and in improving speech translation results. The misunderstanding rate for erroneous sentences is reduced about half. Sixty-nine percent of speech translation results are improved for misrecognized sentences.

View full abstract

Download PDF (2845K)
Analysis of Current Commercial Japanese to Korean Machine Translation Systems and Suggestions for Future Development

TAE WAN KIM, KEY SUN CHOI

1998 Volume 5 Issue 4 Pages 127-149
Published: October 10, 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_127

JOURNAL FREE ACCESS

Show abstractHide abstract

This Report describes the status and performance of current Janpanese-to-Korean Machine Translation Systems. And some suggestions for developing the better systems are made. This result is made by analyzing the latest version of four commercial Japanese-to-Korean Machine Translation Systems in Korea which have been produced until Feburary 1997. Declarative evaluation is executed in the view of user side to measure the translation quality. Typological evaluation is tried to probe the linguistic coverage of current commercial systems. Operational evaluation is performed in the view of user interface. And progress evaluation is executed by comparing the result with the result reported at (Choi and Kim.1996). This report does not intend to rank the relative standing of the systems. The evaluations are executed in the range of interest of this report.

View full abstract

Download PDF (3516K)
Erratum to : A Framework of Integrating Syntactic and Lexical Statistics in Statistical Parsing

1998 Volume 5 Issue 4 Pages 151
Published: 1998
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.5.4_151

JOURNAL FREE ACCESS

Download PDF (34K)

Register with J-STAGE for free!