自然言語処理
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
一般論文(査読有)
Revisiting Pre-training of Embedding Layers in Transformer-based Neural Machine Translation
Masato NeishiNaoki Yoshinaga
著者情報
ジャーナル フリー

2024 年 31 巻 2 号 p. 534-567

詳細
抄録

Recent trends in the pre-training and fine-tuning paradigm have made significant advances in several natural language processing tasks, including machine translation (MT), particularly for low-resource situations. However, it is reported that leveraging out-of-domain data is not as effective, or sometimes even harmful, in MT tasks in high-resource situations, where further improvement is still needed. In this study, we focus on domain-specific dedicated neural machine translation (NMT) models, which still have the advantage in a high-resource situation as concerns translation quality and inference cost. We revisit the in-domain pre-training of embedding layers in Transformer-based NMT models, in which the embeddings are pre-trained with the same training data as the target translation task, considering the large impact of the domain discrepancy between the pre-training and fine-tuning (or training) in MT tasks. Experiments on two translation tasks, ASPEC English-to-Japanese and WMT2017 English-to-German, demonstrate that the in-domain pre-training of embedding layers in a Transformer-based NMT model provides performance improvement without any negative impact and contributes to earlier convergence in training. Through additional experiments, we confirmed that pre-training of the embedding layer of the encoder is more important than that of the embedding layer of the decoder, and the impact does not vanish as the training data size is increased. An analysis of the embeddings revealed the large impact of the pre-training of the embedding layers on the low-frequency tokens.

著者関連情報
© 2024 The Association for Natural Language Processing
前の記事 次の記事
feedback
Top