様々な学習戦略と学習環境におけるHybrid Reward Architectureの性能の評価

藤村 悠太朗; 金子 知適

doi:10.11517/pjsai.JSAI2018.0_2D402

32nd (2018)

Session ID : 2D4-02

DOI https://doi.org/10.11517/pjsai.JSAI2018.0_2D402

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 32nd Annual Conference of the Japanese Society for Artificial Intelligence, 2018

Number : 32

Location : [in Japanese]

Date : June 05, 2018 - June 08, 2018

Evaluation of Hybrid Reward Architecture on various learning policies and environments

*Yutaro FUJIMURA, Tomoyuki KANEKO

Author information

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

Deep Q-Network (DQN) was able to achieve a level comparable to the performance of a professional human player. However, in large and complex domains (e.g. Ms. Pacman), learning can be very slow and unstable. In Hybrid Reward Architecture (HRA), a reward function is decomposed in advance to enhance learning in such domains, and then value functions are separately learned for decomposed reward functions. In this paper, we made some environments that made learning more difficult to evaluate the performance of HRA. The results indicated that HRA need more enhancements to learn environments where learning is difficult on the uniform random policy.

Corresponding author

Conference information

Register with J-STAGE for free!