Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
Why Videos Do Not Guide Translations in Video-guided Machine Translation? An Empirical Evaluation of Video-guided Machine Translation Dataset
Zhishen YangTosho HirasawaMamoru KomachiNaoaki Okazaki
著者情報
ジャーナル フリー

2022 年 30 巻 p. 388-396

詳細
抄録

Video-guided machine translation (VMT) is a type of multimodal machine translation that uses information from videos to guide translation. However, in the VMT 2020 challenge, adding videos only marginally improved the performance of VMT models compared to their text-only baselines. In this study, we systematically analyze why videos did not guide translation. Specifically, we evaluate the models in input degradation and visual sensitivity experiments and compare the results with a human evaluation using VATEX, which is the dataset used in the VMT 2020 challenge. The results indicate that short and straightforward video descriptions in VATEX are sufficient to perform the translations, which renders the videos redundant in the process. Based on our findings, we provide suggestions on the design of future VMT datasets. Code and human-evaluated data are publicly available for future research.

著者関連情報
© 2022 by the Information Processing Society of Japan
前の記事 次の記事
feedback
Top