人工知能学会全国大会論文集
Online ISSN : 2758-7347
34th (2020)
セッションID: 2D1-GS-9-05
会議情報

News Image Caption Generation
*Zhishen YANGNaoaki OKAZAKI
著者情報
会議録・要旨集 フリー

詳細
抄録

Vision and language as a vibrant multimodal machine learning research field aim to create models that serve comprehension of information across vision and language modalities. In this work, we utilized the multimodal Transformer model with joint text-vision representation to approach one of the vision and language tasks: news image caption generation. The multimodal Transformer model leverages context from the article with consideration of the scene in the associated image to generate caption. The experimental result demonstrated the multimodal Transformer significantly improved the quality of generated news image caption.

著者関連情報
© 2020 The Japanese Society for Artificial Intelligence
前の記事 次の記事
feedback
Top