日英多段階難易度制御機械翻訳：評価データの作成および複数参照文に基づく学習の提案

谷 和樹; 田村 晃裕; 梶原 智之; 二宮 崇; 加藤 恒夫

doi:10.5715/jnlp.31.456

Abstract

We aim to construct Japanese (Ja)-to-English (En) multi-level complexity-controllable machine translation (MCMT), which is a Ja-to-En MT model controlling the output complexity at more than two levels. There is no test dataset for Ja-to-En MCMT since existing studies have focused on the English and Spanish language pair. Therefore, we construct a test dataset for Ja-to-En MCMT by using the Newsela corpus, a set of English news articles described in multiple complexity levels, and their manual translation. This study also proposes a multi-reference-based learning method for MCMT. While conventional methods use a single reference translation as a training instance, the proposed method compares multiple target language sentences with the same content at different complexity levels and trains a MCMT model to make the loss for a sentence with a target complexity level lower than that for a non-target complexity level. The evaluations on the test dataset constructed in this study show that BLEU improves by 0.94 points compared to the conventional multitask model.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!