The Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec)
Online ISSN : 2424-3124
2023
Session ID : 2A1-H03
Conference information

Embodied Question Answering Multi-Step Ahead Image Prediction with Mixture of Experts for Embodied Question Answering
*Yuya KAMIWANOKanata SUZUKINaoya CHIBAHiroki MORITetsuya OGATA
Author information
CONFERENCE PROCEEDINGS RESTRICTED ACCESS

Details
Abstract

In this study, we proposed a subtask that combines multiple scales of visual field prediction and investigated its effectiveness for Embodied Question Answering (EQA). In EQA, it is desirable to be able to automatically select a prediction scale according to the situation, because the path to the target object depends on the instructions given. However, previous studies have only examined subtask learning with a limited prediction scale and target. We propose a mixture of experts model in which multiple expert networks predict future images of different time steps, and a higher-level gating network estimates the distribution of each expert’s output. By sequentially adjusting the output of the expert network, the proposed method enables robot navigation considering multiple prediction scales. Comparison experiments on the EQA MP3D dataset show that the proposed method improves the model’s prediction accuracy regardless of the distance to the target.

Content from these authors
© 2023 The Japan Society of Mechanical Engineers
Previous article Next article
feedback
Top