Host: The Japanese Society for Artificial Intelligence
Name : The 32nd Annual Conference of the Japanese Society for Artificial Intelligence, 2018
Number : 32
Location : [in Japanese]
Date : June 05, 2018 - June 08, 2018
Deep Q-Network (DQN) was able to achieve a level comparable to the performance of a professional human player. However, in large and complex domains (e.g. Ms. Pacman), learning can be very slow and unstable. In Hybrid Reward Architecture (HRA), a reward function is decomposed in advance to enhance learning in such domains, and then value functions are separately learned for decomposed reward functions. In this paper, we made some environments that made learning more difficult to evaluate the performance of HRA. The results indicated that HRA need more enhancements to learn environments where learning is difficult on the uniform random policy.