2020 年 35 巻 3 号 p. A-JA9_1-9
Alarge-scaleparallelcorpusisindispensabletotrainencoder-decoderneuralmachinetranslation. Themethod of using synthetic parallel texts, called back-translation, in which target monolingual sentences are automatically translated into the source language, has been proven effective in improving the decoder. However, it does not necessarily help enhance the encoder. In this paper, we propose a method that enhances not only the decoder but also the encoder using target monolingual corpora by generating multiple source sentences via sampling-based sequence generation. The source sentences generated in that way increase their diversity and thus help make the encoder robust. Ourexperimentalresultsshowthatthetranslationqualitywasimprovedbyincreasingthenumberofsynthetic source sentences for each given target sentence. Even though the quality did not reach to the one that realized with a genuine parallel corpus comprising single human translations, our proposed method derived over 50% of the improvementbroughtbytheparallelcorpususingonlyitstargetside, i.e., monolingualdata. Moreover,theproposed samplingmethodresultedinﬁnaltranslationofhigherqualitythann-bestback-translation. Theseresultsindicatethat not only the quality of back-translation but also the diversity of synthetic source sentences is crucial.