One of the problems in spoken language translation is the enormous variety of expressions not found in text translation. This volume can lead to a sparse translation coverage. In order to tackle this problem, we propose a machine translation model where an input is translated through both source-language and target-language paraphrasing processes. In this paper, we discuss the source paraphrasing and the language transfer processes, and the design of our translation model. In the source language paraphrasing, we take the practical approach of untangling slight variations in the source language before transferring a source expression to its target. We discuss how effective our paraphrasing process is in the sense of reducing varieties in a spoken language, with a focus on how many source language patterns are reduced by paraphrasing. In the translation model, we propose an interaction model between the source language paraphraser and the transfer, unlike the conventional assembly-line process flow. In our evaluation we illustrate that over 70% of the input utterances is expected to somehow be changed. Accordingly, we can achieve that one-fifth of all skeleton expressions can be merged into other skeletons, that increases chances of correct translations being obtained. Furthermore, we observe that our interaction model with the paraphraser increases 20-40 percentage points of translation capability, regardless of the transfer knowledge size.
This article presents two statistically-based methods of automatically generating paraphrases for sentences; one based on direct statistical machine translation, the other based on data-oriented techniques. These paraphrasers are evaluated by human judges, and compared to both human paraphrases and those generated by a simple baseline model. The data-oriented approach proved to be the most successful in this evaluation and a second experiment was conducted to determine the usefulness of machine-generated paraphrases when used to expand the reference set used for machine translation evaluation. Varying numbers of synthetic paraphrases were mixed with varying numbers of real references to determine the circumstances under which the addition of synthetic paraphrases might be useful. Nine different machine translation systems were evaluated in this study using scores from nine human judges. Three machine translation evaluation schemes were used to perform the machine translation evaluation: BLEU, NIST and mWER. The results show that the usefulness of the synthetic paraphrases depends on which of the machine translation evaluation methods is used. The paraphrases degraded the NIST performance, but improved the evaluation performance of both BLEU and mWER.