文内コンテキストを利用した分割統治ニューラル機械翻訳

石川 隆太; 加納 保昌; 須藤 克仁; 中村 哲

doi:10.5715/jnlp.32.114

Abstract

Although neural machine translation (NMT) usually produces high-quality translation through flexible word choice and fluency, its quality can decrease for long input sentences. A divide-and-conquer approach to this problem exists that splits a long input sentence into shorter segments and merges their translations, resulting in limited improvement in NMT. In this study, we propose a novel divide-and-conquer method for NMT that improves the translation of long sentences in an intra-sentence context. The proposed method (1) splits a sentence around coordinating conjunctions, connecting clauses labeled S by syntactic parsing, (2) translates these clauses using a clause-level translation model that utilizes an intra-sentence context, and (3) merges clause-level translations using another sequence-to-sequence model to obtain a sentence-level translation. In our English-to-Japanese translation experiments on ASPEC using a pre-trained multilingual BART model, the proposed method outperformed a baseline multilingual BART-based NMT for input sentences with over 40 words.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!