発話を対象とした類似文検索と機械翻訳への適用

下畑 光夫; 隅田 英一郎; 松本 裕治

doi:10.5715/jnlp.11.4_105

Abstract

When we apply input sentences of spoken language to a machine translation, wesometimes cannot get proper translations due to the characteristics of spoken language.In this paper, we propose a method for recovering proper translations bycombining similar sentence retrieval with machine translation when it is difficult toget a proper translation of the input sentence. If a given input sentence is found tobe difficult to translate properly, a sentence similar to the input sentence is retrievedfrom a corpus of translatable sentences. The similarity between the candidate and theinput sentence is determined from the ratio of the N-gram overlap. In addition, weuse two additional conditions to improve the retrieval performance: excluding candidatesentences with a content word that does not exist in the input sentence, anddecreasing the weight of functional words.In an experiment of retrieval in Japanese, our method outputs retrieved sentences for 87.2% of all input sentences and 60.4%of them are similar sentences. In an experiment of combining our method and machinetranslation, in which untranslatable input sentences are replaced with similarsentences from a translatable corpus, our method recovered proper translations from25.9%of the untranslatable input sentences.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!