Host: The Japan Society of Mechanical Engineers
Name : [in Japanese]
Date : June 06, 2021 - June 08, 2021
This paper presents a target-driven visual navigation technique that can exploit long-term history for navigating an agent to a given target image. In particular we use Transformer architecture that has been developed in the natural language field and can handle long-term temporal dependencies. Experimental results showed that the use of Transformer improves the navigation performance to new target images by utilizing long-term history and also improves the data efficiency, especially in large-scale environments. We also conducted an ablation study to show how the number of training frames affects the navigation performance. This results in the accuracy of the proposed method improving while the baseline decreases as the number of training frames increases.