Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
34th (2020)
Session ID : 3Q5-GS-9-01
Conference information

Semantic Consistency Assessment of Visual and Text Content using Multimodal Deep Neural Networks
Riko SUZUKI*Mikito KONISHIJunya IKEDADaichi HAYASHISo FUKAIYu SUGAWARAYusuke MACHIIYusuke YAMAURA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Semantic consistency assessment of an image and text inside a document is important task because readers refer the image to deepen understanding of text content. In this study, we develop a multimodal deep neural networks for the semantic consistency assessment of the image and the text. We propose a novel approach combines binary classification and angular margin loss to acquire discriminative features. We also clarify contradictions between the image and the text by visualizing cross-attention among objects inside the image and words in text. To show the effectiveness of our approach, we evaluate the accuracy of several models using flickr30k dataset which contains images and their captions. The results show that our proposed model outperforms the existing joint embedding model with 0.9 improvements in F-measure.

Content from these authors
© 2020 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top