Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Japanese-to-English Multi-Level Complexity-Controllable MT: Test Data and Multi-Reference-Based Learning
Kazuki TaniAkihiro TamuraTomoyuki KajiwaraTakashi NinomiyaTsuneo Kato
Author information
JOURNAL FREE ACCESS

2024 Volume 31 Issue 2 Pages 456-478

Details
Abstract

We aim to construct Japanese (Ja)-to-English (En) multi-level complexity-controllable machine translation (MCMT), which is a Ja-to-En MT model controlling the output complexity at more than two levels. There is no test dataset for Ja-to-En MCMT since existing studies have focused on the English and Spanish language pair. Therefore, we construct a test dataset for Ja-to-En MCMT by using the Newsela corpus, a set of English news articles described in multiple complexity levels, and their manual translation. This study also proposes a multi-reference-based learning method for MCMT. While conventional methods use a single reference translation as a training instance, the proposed method compares multiple target language sentences with the same content at different complexity levels and trains a MCMT model to make the loss for a sentence with a target complexity level lower than that for a non-target complexity level. The evaluations on the test dataset constructed in this study show that BLEU improves by 0.94 points compared to the conventional multitask model.

Content from these authors
© 2024 The Association for Natural Language Processing
Previous article Next article
feedback
Top