A Multimodal Target-Source Classifier Model for Object Fetching from Natural Language Instructions

Aly MAGASSOUBA; Komei SUGIURA; Hisashi KAWAI

doi:10.11517/pjsai.JSAI2019.0_2D3E403

Abstract

In this paper, we address the fetching task from ambiguous instructions. A typical fetching task consists of picking up a target object specified by ambiguous instructions. We specifically propose a multimodal target-source classifier model (MTCM) that grounds the instructions in the scene. More explicitly, MCTM can predict the likelihood of a target object in addition to the source of this target using linguistic and visual features. Our approach improves the accuracy of the previous state-of-the-art method for target object prediction in fetching task.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Conference information

Register with J-STAGE for free!