Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
 
Spatial Hierarchical Attention Network Based Video-guided Machine Translation
Weiqi GuHaiyue SongChenhui ChuSadao Kurohashi
著者情報
ジャーナル フリー

2023 年 31 巻 p. 299-307

詳細
抄録

Video-guided machine translation, as one type of multimodal machine translation, aims to engage video contents as auxiliary information to address the word sense ambiguity problem in machine translation. Previous studies only use features from pre-trained action detection models as motion representations of the video to solve the verb sense ambiguity and neglect the noun sense ambiguity problem. To address this, we propose a video-guided machine translation system using both spatial and motion representations. For the spatial part, we propose a hierarchical attention network to model the spatial information from object-level to video-level. We investigate and discuss spatial features extracted from objects with pre-trained convolutional neural network models and spatial concept features extracted from object labels and attributes with pre-trained language models. We further investigate spatial feature filtering by referring to corresponding source sentences. Experiments on the VATEX dataset show that our system achieves a 35.86 BLEU-4 score, which is 0.51 score higher than the single model of the SOTA method. Experiments on the How2 dataset further verify the generalization ability of our proposed system.

著者関連情報
© 2023 by the Information Processing Society of Japan
前の記事 次の記事
feedback
Top