Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper
Supervised Visual Attention for Multimodal Neural Machine Translation
Tetsuro NishiharaAkihiro TamuraTakashi NinomiyaYutaro OmoteHideki Nakayama
Author information
JOURNAL FREE ACCESS

2021 Volume 28 Issue 2 Pages 554-572

Details
Abstract

This paper proposed a supervised visual attention mechanism for multimodal neural machine translation (MNMT), trained with constraints based on manual alignments between words in a sentence and their corresponding regions of an image. The proposed visual attention mechanism captures the relationship between a word and an image region more precisely than a conventional visual attention mechanism trained through MNMT in an unsupervised manner. Our experiments on English-German and German-English translation tasks using the Multi30k dataset and on English-Japanese and Japanese-English translation tasks using the Flickr30k Entities JP dataset show that a Transformer-based MNMT model can be improved by incorporating our proposed supervised visual attention mechanism and that further improvements can be achieved by combining it with a supervised cross-lingual attention mechanism (up to +1.61 BLEU, +1.7 METEOR).

Content from these authors
© 2021 The Association for Natural Language Processing
Previous article Next article
feedback
Top