Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
36th (2022)
Session ID : 3Yin2-20
Conference information

End-to-end training of Object Segmentation Task and Video Question-Answering Task
*Hidemoto NAKADAHideki ASOH
Author information
Keywords: VAQ, Object detection
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

For complicated VQA tasks that incorporates multiple objects, to train the VQA model using segmented objects data as inputs is proved to be effective for various downstream tasks. In this work we tried to train the VQA task model and object segmentation model in end-to-end fashion instead of training independently. We employed CLEVRER as a target VQA task. We first trained MONet, an object segmentation network, with the dataset, and trained Aloe, a VQA model, using the output of the trained MONet. Finally we connect MONet ans Aloe to finetune them in end-to-end setting and confirmed that the performance of VQA task has been improved.

Content from these authors
© 2022 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top