物体中心表現を用いた世界モデルの獲得

中野 聡大; 鈴木 雅大; 松尾 豊

doi:10.11517/pjsai.JSAI2024.0_2O6OS16a04

Abstract

In reinforcement learning settings, model-based methods are a promising approach. learns. This approach learns a world model from imagination, and learn complex behaviors to solve long-horizon tasks from visual inputs only. Recent world models using transformer have improved the sample-efficiency when solving these tasks, due to the transformer's ability to capture long-term dependencies. However, world models still struggle to solve compositional tasks, as predicting object interactions and accurately tracking objects, especially for unseen configurations are common difficulties. Object-centric learning is a method to learn to disentangle a scene or a video into each objects without supervision, leading to more compositional understanding and better generalization to unseen objects and scenes. In this paper, we propose a world model that uses object-centric latents to predict dynamics. Our model aims to combine the abilities of generalization by compositionality of object-centric learning and sample-efficiency and long-horizon prediction of transformer-based world models. To validate the efficacy of our approach, we conducted experiments on OCRL benchmark dataset.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!