2024 Volume 31 Issue 2 Pages 456-478
We aim to construct Japanese (Ja)-to-English (En) multi-level complexity-controllable machine translation (MCMT), which is a Ja-to-En MT model controlling the output complexity at more than two levels. There is no test dataset for Ja-to-En MCMT since existing studies have focused on the English and Spanish language pair. Therefore, we construct a test dataset for Ja-to-En MCMT by using the Newsela corpus, a set of English news articles described in multiple complexity levels, and their manual translation. This study also proposes a multi-reference-based learning method for MCMT. While conventional methods use a single reference translation as a training instance, the proposed method compares multiple target language sentences with the same content at different complexity levels and trains a MCMT model to make the loss for a sentence with a target complexity level lower than that for a non-target complexity level. The evaluations on the test dataset constructed in this study show that BLEU improves by 0.94 points compared to the conventional multitask model.