人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
サンプリング生成に基づく複数逆翻訳を用いたニューラル機械翻訳
今村 賢治藤田 篤隅田 英一郎
著者情報
ジャーナル フリー

2020 年 35 巻 3 号 p. A-JA9_1-9

詳細
抄録

Alarge-scaleparallelcorpusisindispensabletotrainencoder-decoderneuralmachinetranslation. Themethod of using synthetic parallel texts, called back-translation, in which target monolingual sentences are automatically translated into the source language, has been proven effective in improving the decoder. However, it does not necessarily help enhance the encoder. In this paper, we propose a method that enhances not only the decoder but also the encoder using target monolingual corpora by generating multiple source sentences via sampling-based sequence generation. The source sentences generated in that way increase their diversity and thus help make the encoder robust. Ourexperimentalresultsshowthatthetranslationqualitywasimprovedbyincreasingthenumberofsynthetic source sentences for each given target sentence. Even though the quality did not reach to the one that realized with a genuine parallel corpus comprising single human translations, our proposed method derived over 50% of the improvementbroughtbytheparallelcorpususingonlyitstargetside, i.e., monolingualdata. Moreover,theproposed samplingmethodresultedinfinaltranslationofhigherqualitythann-bestback-translation. Theseresultsindicatethat not only the quality of back-translation but also the diversity of synthetic source sentences is crucial.

著者関連情報
© 人工知能学会 2020
次の記事
feedback
Top