Host: The Japanese Society for Artificial Intelligence
Name : 34th Annual Conference, 2020
Number : 34
Location : Online
Date : June 09, 2020 - June 12, 2020
Offline reinforcement learning (offline RL) is a promising method when we cannot expect data from online interactions. Most offline RL algorithms rely on large datasets, and their training tends to be unstable when the size is small. Although model-based RL is a popular choice for enhancing sample efficiency in online RL, naively incorporating dynamics model to offline settings can lead to poor performance. We propose a novel algorithm in offline model-based RL, behavior regularized model-ensemble method, which learns policy from imaginary rollouts while regularizing the target policy with KL divergence from the estimated behavior policy. We show in continuous control tasks that our method can learn policies more stably even with smaller datasets.