オフラインデータを利用したモデルベース強化学習

松嶋 達也; 古田 拓毅; 顧 世翔; 松尾 豊

doi:10.11517/pjsai.JSAI2020.0_2D5OS18b03

Abstract

Offline reinforcement learning (offline RL) is a promising method when we cannot expect data from online interactions. Most offline RL algorithms rely on large datasets, and their training tends to be unstable when the size is small. Although model-based RL is a popular choice for enhancing sample efficiency in online RL, naively incorporating dynamics model to offline settings can lead to poor performance. We propose a novel algorithm in offline model-based RL, behavior regularized model-ensemble method, which learns policy from imaginary rollouts while regularizing the target policy with KL divergence from the estimated behavior policy. We show in continuous control tasks that our method can learn policies more stably even with smaller datasets.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!