Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 5, Issue 4
Displaying 1-10 of 10 articles from this issue
  • [in Japanese]
    1998 Volume 5 Issue 4 Pages 1-2
    Published: October 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (226K)
  • Haodong Wu, Teiji Furugori
    1998 Volume 5 Issue 4 Pages 3-16
    Published: October 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes a method in determining syntactic structure for coordinate constructions. It is based on the information taken from semantic similarities, selectional restrictions, and some other linguistic cues. We discuss the role the information plays in resolving ambiguities that appear in coordinate constructions, describe the means of acquiring the necessary information automatically from two on-line corpora and a lexical database, and devise two algorithms for disambiguating coordinate constructions. An experiment that follows shows effectiveness of our method and its applicability to resolving ambiguities in some other syntactic structures.
    Download PDF (1206K)
  • Yujie Zhang, Kazuhiko Ozeki
    1998 Volume 5 Issue 4 Pages 17-33
    Published: October 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In conventional bunsetsu segmentation methods for Japanese sentences, segmentation rules have been given manually. This causes difficulties in maintaining the consistency of the rules, and in deciding an efficient order of rule application. This paper proposes a method of automatic bunsetsu segmentation using a classification tree, by which knowledge about bunsetsu boundaries is automatically acquired from a corpus, and an efficient order of rule application is realized automatically. It can adapt quickly to a new system of parts of speech, and also to a new task domain without the need for changing the algorithm. Generation of classification trees for bunsetsu segmentation and evaluation experiments were carried out on an ATR corpus and an EDR corpus. The segmentation accuracy of 98.9% was achieved for the ATR corpus, and 96.2% for the EDR corpus. The method was compared with a simple rule-based method and the Bayes decision rule on the ATR corpus. The proposed method outperformed the rule-based method when the training data size was larger than about 20 sentences, and outperformed the Bayes decision rule over the whole range of training data sizes. The superiority of the proposed method was more evident over the former when the training data size was larger, and over the latter when the training data size was smaller.
    Download PDF (1521K)
  • TORU HISAMITSU, YOSHIHIKO NITTA
    1998 Volume 5 Issue 4 Pages 35-60
    Published: October 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Compound nouns tend to be important words because a compound noun conveys a lot of information which can even summarize a document. Therefore the analysis of compound nouns can contribute to machine translation, information extraction, or information retrieval. Since compound nouns lack syntactic clues, existing methods have utilized manually written rules and thesauri in order to analyze word dependency structure in compound nouns. Consequently the methods lack robustness in treating open corpora such as newspaper articles which contain a number of unregistered words. This paper presents a thesaurus-free corpus-based approach which scans a corpus with a set of templates and extracts co-occurrence data of the nouns which construct the compound noun. Unregistered words such as abbreviations and short compound nouns are detected in the process of template-matching and the co-occurrence data of the newly found words are additionally extracted, which leads to the robustness and high accuracy of the analysis. The accuracy of the methodwas evaluated using 400 compound nouns of length 5, 6, 7, and 8. The numbers of the correct analysis were 90, 86, 84, and 84 in 100 compound nouns of length 5, 6, 7, and 8 respectively.
    Download PDF (2610K)
  • Akitoshi Okumura, Kazunori Muraki
    1998 Volume 5 Issue 4 Pages 61-76
    Published: October 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    The authors propose a model for analyzing English sentences including coordinate conjunctions such as “and”, “or”, “but” and equivalent words. The syntactic analysis of English coordinate sentences is one of the most difficult problems in machine translation (MT) systems. The problem is selecting, from all possible candidates, the correct syntactic structure formed by an individual coordinate conjunction, i. e. determining which constituents are coordinated by the conjunction. Typically, so many possible structures are produced that MT systems cannot select the correct one, even if the grammars allow us to write the rules in simple notations. This paper presents an English coordinate structure analysis model, which provides top-down scope information on the correct syntactic structure by taking advantage of the symmetric patterns of parallelism. The model is based on a balance-matching operation for two lists of feature sets. It has four effects, namely: a reduction in analysis costs, a decrease in word disambiguation, the interpretation of ellipses, and robust analysis. This model was practically implemented and incorporated into the English-Japanese MT system, and it had about 70%accuracy for 3215Wall Street Journal sentences.
    Download PDF (1439K)
  • AKITOSHI OKUMURA, KAI ISHIKAWA, KENJI SATOH
    1998 Volume 5 Issue 4 Pages 77-93
    Published: October 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method to translate query terms for cross-language information retrieval (CLIR). CLIR is generally performed by query translation and information retrieval (IR). CLIR is less precise than IR because of query term translation ambiguities, especially in Japanese and English CLIR. We developed Double MAXimize criteria based on comparable corpora (DMAX), which is an equivalent translation selection method for machine translation (MT), by using term co-occurrence frequency in comparable corpora. Though a term should be translated into one word for MT, a query term should be translated into several appropriate terms for CLIR. This paper describes a generalized query term selection model, the GDMAX for CLIR. In this model, a source query is represented in the vector form of the term co-occurrence frequency in source corpora. Translation queries are searched by vector similarity calculation between a source query and a target query represented by the co-occurrence frequency in comparable target corpora. GDMAX was evaluated by using TREC6 (Text Retrieval Conference) English data and 15 Japanese queries. GDMAX queries had approximately 62% accuracy of human queries, and 6% higher accuracy than machine translation queries and 12% higher accuracy than bilingual dictionary-based aueries.
    Download PDF (3303K)
  • EMIKO SUZUKI, SATOSHI ONO, HITOSHI KANOH
    1998 Volume 5 Issue 4 Pages 95-110
    Published: October 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Many Japanese sentence segmentation algorithms have been proposed to translate Japanese into English or to query databases. Those methods use a huge dictionary including word representation, readings, and grammar references which require considerable time and work. Since Braille needs only blanks and phonetic information, we do not have to check grammatical combination of words. We propose a new system to segment the Japanese sentence in order to translate Japanese into Braille. Our methods uses a knowledge base which categorizes Japanese sentence segmentation rules. Segmentation rules for translating into Braille are heuristic, ambiguous and complicated. Software is available but the user interface is not very good and volunteers rarely use it. So we provide a user interface for checking the position of ambiguous segmentation. In this way, the users'workload is reduced since it is no longer necessary to check all parts of the sentences. In our method, only a few small tables including words with the segmentation patterns are necessary. Our knowledge base does not need any grammatical information, but utilizes surface information such as Kanji, Hiragana, Katakana, and other character types. The accuracy of segmentaion is 98.0%-a higher rate than that found in usual methods.
    Download PDF (4921K)
  • YUMI WAKITA, [in Japanese], HITOSHI IIDA
    1998 Volume 5 Issue 4 Pages 111-125
    Published: October 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method for extracting the correct parts from speech recognition results by using an example-based approach for parsing those results that include several recognition errors. Correct parts are extracted using two factors: (1) the semantic distance between the input expression and example expression, and (2) the structure selected by the shortest semantic distance. The examination results showed that the proposed method is able to efficiently extract the correct parts from speech recognition results. About ninety-six percent of the extracted parts are correct. The results also showed that the proposed method is effective in understanding misrecognition speech sentences and in improving speech translation results. The misunderstanding rate for erroneous sentences is reduced about half. Sixty-nine percent of speech translation results are improved for misrecognized sentences.
    Download PDF (2845K)
  • TAE WAN KIM, KEY SUN CHOI
    1998 Volume 5 Issue 4 Pages 127-149
    Published: October 10, 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This Report describes the status and performance of current Janpanese-to-Korean Machine Translation Systems. And some suggestions for developing the better systems are made. This result is made by analyzing the latest version of four commercial Japanese-to-Korean Machine Translation Systems in Korea which have been produced until Feburary 1997. Declarative evaluation is executed in the view of user side to measure the translation quality. Typological evaluation is tried to probe the linguistic coverage of current commercial systems. Operational evaluation is performed in the view of user interface. And progress evaluation is executed by comparing the result with the result reported at (Choi and Kim.1996). This report does not intend to rank the relative standing of the systems. The evaluations are executed in the range of interest of this report.
    Download PDF (3516K)
  • 1998 Volume 5 Issue 4 Pages 151
    Published: 1998
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (34K)
feedback
Top