日常タスクにおける将来イベントのクロスモーダル説明文生成

神原 元就; 杉浦 孔明

doi:10.11517/pjsai.JSAI2022.0_2O1GS702

36th (2022)

Session ID : 2O1-GS-7-02

DOI https://doi.org/10.11517/pjsai.JSAI2022.0_2O1GS702

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 36

Location : [in Japanese]

Date : June 14, 2022 - June 17, 2022

Cross-modal Description Generation for Future Events in Daily Tasks

*Motonari KAMBARA, Komei SUGIURA

Author information

Keywords: Video captioning, Future captioning, Cross-modal, Relational Self-Attention

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

In this paper, our aim is to generate a caption about a future event. We propose the Relational Future Captioning Model (RFCM), a crossmodal language generation model for the future captioning task. The RFCM has the Relational Self-Attention Encoder to extract the relationships between events more effectively than the conventional self-attention in transformers. We conducted comparison experiments, and the results show the RFCM outperforms a baseline method on two datasets.

Corresponding author

Conference information

Register with J-STAGE for free!