Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
In reinforcement learning settings, model-based methods are a promising approach. learns. This approach learns a world model from imagination, and learn complex behaviors to solve long-horizon tasks from visual inputs only. Recent world models using transformer have improved the sample-efficiency when solving these tasks, due to the transformer's ability to capture long-term dependencies. However, world models still struggle to solve compositional tasks, as predicting object interactions and accurately tracking objects, especially for unseen configurations are common difficulties. Object-centric learning is a method to learn to disentangle a scene or a video into each objects without supervision, leading to more compositional understanding and better generalization to unseen objects and scenes. In this paper, we propose a world model that uses object-centric latents to predict dynamics. Our model aims to combine the abilities of generalization by compositionality of object-centric learning and sample-efficiency and long-horizon prediction of transformer-based world models. To validate the efficacy of our approach, we conducted experiments on OCRL benchmark dataset.