モデルフリーとモデルベース強化学習のための非同期並列学習

内部 英治

doi:10.11517/pjsai.JSAI2021.0_2J4GS8c02

Abstract

Reinforcement learning algorithms are categorized into model-based methods, which explicitly estimate an environmental model and a reward function, and model-free methods, which directly learn a policy from real or generated experiences. We have proposed the parallel reinforcement learning algorithm for training multiple model-free and model-based reinforcement learners. The experimental results show a simple algorithm can contribute to complex algorithms' learning. However, since each learner's computation time was not considered, we could not fully demonstrate the advantage of using a simple model-free reinforcement learner. This paper proposes an asynchronous parallel reinforcement learning method that considers the differences in control frequencies. The main contribution is separating the replay buffers collected by each learner and transforming the experience replay buffer to absorb the difference in control frequencies. The proposed method is applied to benchmark problems and compared with the case without considering the difference in control frequencies. The results show that the proposed algorithm selected the simple model-based method with a short control frequency in the early stage of learning, the complex model-based method in the middle stage of learning, and the model-free method in the late learning stage.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!