Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
Deep reinforcement learning is a method in which an agent learns optimal behavior by trial-and-error in an unknown environment and relying on the rewards it obtains, and it has outperformed humans in various gaming tasks such as Atari2600 and board games. However, the agent acts randomly without any exploration criteria until it reaches the reward. Therefore, in large and complex environments where there are few opportunities to obtain rewards, a large number of trials are required to obtain an appropriate action. In this paper, we pre-train a Critic model with a Mask-Attention mechanism and use the resulting attention map as a exploration criterion for the Policy model to enable efficient learning. Experiments using Minecraft show that the proposed method can learn actions efficiently.