Transactions of the Japanese Society for Artificial Intelligence
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
Original Paper
Neural Machine Translation Using Multiple Back-translation Generated by Sampling
Kenji ImamuraAtsushi FujitaEiichiro Sumita
Author information
JOURNAL FREE ACCESS

2020 Volume 35 Issue 3 Pages A-JA9_1-9

Details
Abstract

Alarge-scaleparallelcorpusisindispensabletotrainencoder-decoderneuralmachinetranslation. Themethod of using synthetic parallel texts, called back-translation, in which target monolingual sentences are automatically translated into the source language, has been proven effective in improving the decoder. However, it does not necessarily help enhance the encoder. In this paper, we propose a method that enhances not only the decoder but also the encoder using target monolingual corpora by generating multiple source sentences via sampling-based sequence generation. The source sentences generated in that way increase their diversity and thus help make the encoder robust. Ourexperimentalresultsshowthatthetranslationqualitywasimprovedbyincreasingthenumberofsynthetic source sentences for each given target sentence. Even though the quality did not reach to the one that realized with a genuine parallel corpus comprising single human translations, our proposed method derived over 50% of the improvementbroughtbytheparallelcorpususingonlyitstargetside, i.e., monolingualdata. Moreover,theproposed samplingmethodresultedinfinaltranslationofhigherqualitythann-bestback-translation. Theseresultsindicatethat not only the quality of back-translation but also the diversity of synthetic source sentences is crucial.

Content from these authors
© The Japanese Society for Artificial Intelligence 2020
Next article
feedback
Top