Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
In this study, we develop a multimodal language comprehension model that allows domestic service robots to understand object fetching instructions. We propose a multimodal language understanding model, Funnel UNITER, which gradually reduces the dimensions of the query, key, and value in each transformer layer to reduce the computational cost of self-attention. We also built a new dataset for the multimodal language understanding for fetching instruction (MLU-FI) task called the ALFRED-fetch dataset. Our model outperformed the baseline method in both classification accuracy and training time.