Host: The Japanese Society for Artificial Intelligence
Name : The 35th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 35
Location : [in Japanese]
Date : June 08, 2021 - June 11, 2021
Instruction following is a task for learning to transform natural language instructions into a sequence of actions in visual environments. Recently, an interactive instruction following task has been proposed to encourage research in following natural language instructions that require interactions with objects. We observe that an existing model for this task is not robust to variations of objects and instructions, which may cause a serious problem in real-world applications. We assume that this is due to the high sensitiveness of neural feature extraction to small perturbations in vision and language. We propose a Neuro-Symbolic approach to mitigate the lack of robustness. Concretely, we introduce object detection and semantic parsing modules to this task and make reasoning over symbolic features feasible. Our experiments on the ALFRED dataset show that our approach significantly improves the performance on subtasks that require object interactions.