Critic-Attentionによる探索基準を用いた大規模環境における効率的な深層強化学習

村瀬 卓也; 平川 翼; 山下 隆義; 藤吉 弘亘

doi:10.11517/pjsai.JSAI2022.0_3Yin210

Abstract

Deep reinforcement learning is a method in which an agent learns optimal behavior by trial-and-error in an unknown environment and relying on the rewards it obtains, and it has outperformed humans in various gaming tasks such as Atari2600 and board games. However, the agent acts randomly without any exploration criteria until it reaches the reward. Therefore, in large and complex environments where there are few opportunities to obtain rewards, a large number of trials are required to obtain an appropriate action. In this paper, we pre-train a Critic model with a Mask-Attention mechanism and use the resulting attention map as a exploration criterion for the Policy model to enable efficient learning. Experiments using Minecraft show that the proposed method can learn actions efficiently.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!