Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 11, Issue 5
Displaying 1-9 of 9 articles from this issue
  • [in Japanese]
    2004 Volume 11 Issue 5 Pages 1-2
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (185K)
  • HIDEKI KASHIOKA
    2004 Volume 11 Issue 5 Pages 3-18
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Recently, natural language processing researches pay attention to the data or processing technique for paraphrase. Unfortunately, we have not many data for paraphrase. There are some research reports with collecting the synonymous expression with parallel corpus. However, suitable corpus for collecting the set of paraphrase is not available. Then, we get a few variations of expression in the paraphrase set when we tried in this method with parallel corpus. In this paper, we proposed the grouping method based on the basic idea as grouping the synonymous sentences related with the translation recursively and decomposed the wrong group using DMdecomposition algorithm. The wrong groups are included the expression that cannot be paraphrase caused some words or expressions have different meanings in different situations. We discuss our method and experimental result with BTEC that is multilingual parallel corpus.
    Download PDF (9544K)
  • NOBUHIRO KAJI, MASASHI OKAMOTO, SADAO KUROHASHI
    2004 Volume 11 Issue 5 Pages 19-37
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    There are a lot of differences between expressions used in written language and spoken language. This paper represents a method of paraphrasing written language specific vocabulary into spoken language vocabulary. They can be distinguished based on the occurrence probability in written and spoken language corpora which are automatically collected from WWW. Experimental results indicated the effectiveness of our method.The precision of the collected corpora was 94%, and the accuracy of learning paraphrases was 79%.
    Download PDF (3885K)
  • YASUHIRO OGAWA, SATOSHI KAMATANI, MUHTAR MAHSUT, YASUYOSHI INAGAKI
    2004 Volume 11 Issue 5 Pages 39-61
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In machine translation, the number of words in a bilingual dictionary has an important influence on the translation. However, the development cost of such a dictionary is very expensive. In this paper, we resolve this problem by paraphrasing a non-entry word into the entry words. We divide the paraphrasing process into two steps: collecting and screening. In the collecting step, we make paraphrasing expressions of an original word by using its lexical descriptions in a Japanese monolingual dictionary. In the following screening step, we calculate the similarity between the original word and each of its paraphrasing expressions, and choose the best one. We applied this method to our Japanese-Uighur bilingual dictionary. As a result, for 68.3% of non-entry words, the appropriate Uighur words were given.
    Download PDF (2809K)
  • Kazuhide Yamamoto
    2004 Volume 11 Issue 5 Pages 63-86
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    One of the problems in spoken language translation is the enormous variety of expressions not found in text translation. This volume can lead to a sparse translation coverage. In order to tackle this problem, we propose a machine translation model where an input is translated through both source-language and target-language paraphrasing processes. In this paper, we discuss the source paraphrasing and the language transfer processes, and the design of our translation model. In the source language paraphrasing, we take the practical approach of untangling slight variations in the source language before transferring a source expression to its target. We discuss how effective our paraphrasing process is in the sense of reducing varieties in a spoken language, with a focus on how many source language patterns are reduced by paraphrasing. In the translation model, we propose an interaction model between the source language paraphraser and the transfer, unlike the conventional assembly-line process flow. In our evaluation we illustrate that over 70% of the input utterances is expected to somehow be changed. Accordingly, we can achieve that one-fifth of all skeleton expressions can be merged into other skeletons, that increases chances of correct translations being obtained. Furthermore, we observe that our interaction model with the paraphraser increases 20-40 percentage points of translation capability, regardless of the transfer knowledge size.
    Download PDF (4909K)
  • Andrew Finch, Taro Watanabe, Yasuhiro Akiba, Eiichiro Sumita
    2004 Volume 11 Issue 5 Pages 87-111
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This article presents two statistically-based methods of automatically generating paraphrases for sentences; one based on direct statistical machine translation, the other based on data-oriented techniques. These paraphrasers are evaluated by human judges, and compared to both human paraphrases and those generated by a simple baseline model. The data-oriented approach proved to be the most successful in this evaluation and a second experiment was conducted to determine the usefulness of machine-generated paraphrases when used to expand the reference set used for machine translation evaluation. Varying numbers of synthetic paraphrases were mixed with varying numbers of real references to determine the circumstances under which the addition of synthetic paraphrases might be useful. Nine different machine translation systems were evaluated in this study using scores from nine human judges. Three machine translation evaluation schemes were used to perform the machine translation evaluation: BLEU, NIST and mWER. The results show that the usefulness of the synthetic paraphrases depends on which of the machine translation evaluation methods is used. The paraphrases degraded the NIST performance, but improved the evaluation performance of both BLEU and mWER.
    Download PDF (2356K)
  • Using Transformation Based on a Defined Criteria
    MASAKI MURATA, HITOSHI ISAHARA
    2004 Volume 11 Issue 5 Pages 113-133
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Studies on paraphrasing are important in various research topics such as sentence generation, summarization, and question-answering. A universal model is described for paraphrasing that transforms according to defined criteria. We show that by using different criteria, we can construct different kinds of paraphrasing systems including one for compressing sentences, one for polishing the sentences up, one for transforming written language into spoken language, one for transforming English words into synonyms with the same meaing containing less “l” and “r” letters, and one for answering questions. Our model efficiently constructs systems and produces dynamic paraphrasing systems. It should prompt the creation of new paraphrasing systems in the feature.
    Download PDF (2391K)
  • MASAKI MURATA, TOSHIYUKI KANAMARU, HITOSHI ISAHARA
    2004 Volume 11 Issue 5 Pages 135-149
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Studies on paraphrasing are important in various research topics such as sentence generation, summarization, and question-answering. Extracting automatic paraphrases by matching definitions of the same word in two dictionaries is described. A new method for extracting these paraphrases is also described. Higher precision was obtained than with the conventional method of using frequency. Our method can be applied to other studies on paraphrase extraction. The method obtained the precision rate of 0.748in the top 500data and that of 0.222in the 500data that were extracted randomly, when a synonym only was judged as a correct answer. It obtained the precision rate of 0.954in the top 500data and that of 0.722in the 500 data that were extracted randomly, when a hypernym and a similar expression were also judged as correct answers.
    Download PDF (1704K)
  • KENTARO INUI, ATSUSHI FUJITA
    2004 Volume 11 Issue 5 Pages 151-198
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Paraphrases are alternative ways of conveying the same content. The language technology for processing paraphrases, namely, paraphrase generation and paraphrase recognition, has drawn the attention of an increasing number of researchers because of its potential contribution to a wide variety of natural language applications. This survey paper overviews recent research trends in paraphrase generation and recognition, and discusses future prospects, addressing the issues of the definition of paraphrases, transformation-based paraphrase generation, paraphrase recognition in question answering and multi-document summarization, and finally corpus-based knowledge acquisition.
    Download PDF (11083K)
feedback
Top