Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper
Encoder-Decoder Attention ≠ Word Alignment: Axiomatic Method of Learning Word Alignments for Neural Machine Translation
Chunpeng MaAkihiro TamuraMasao UtiyamaTiejun ZhaoEiichiro Sumita
Author information
JOURNAL FREE ACCESS

2020 Volume 27 Issue 3 Pages 531-552

Details
Abstract

The encoder-decoder attention matrix has been regarded as the (soft) alignment model for conventional neural machine translation (NMT) models such as RNN-based models. However, we show empirically that this is not true for the Transformer. By comparing the Transformer with the RNN-based NMT model, we find two inherent differences, and accordingly present two methods of capturing word alignments in the Transformer. Furthermore, instead of focusing on the Transformer, we present three axioms for the attention mechanism that captures word alignments, and propose a new attention mechanism based on these axioms that we have termed the axiomatic attention mechanism (AAM), and which is applicable to any NMT models. The AAM depends on a perturbation function, and we apply several perturbation functions to the AAM, including a novel function based on the masked language model (Devlin, Chang, Lee, and Toutanova 2019). Using the AAM to guide the training of an NMT model improved both the translation performance and the learning of word alignments of the NMT model. Our research sheds light on the interpretation of sequence-to-sequence models on neural machine translation.

Content from these authors
© 2020 The Association for Natural Language Processing
Previous article Next article
feedback
Top