不完全知覚問題に対するProfit Sharingと遺伝的アルゴリズムを用いたハイブリッド学習

鈴木 晃平; 加藤 昇平

doi:10.1541/ieejeiss.137.1591

Abstract

Reinforcement learning is generally performed in the Markov decision processes (MDP). However, there is a possibility that the agent can not correctly observe the environment due to the perception ability of the sensor. This is called partially observable Markov decision processes (POMDP). In a POMDP environment, an agent may observe the same information at more than one state. HQ-learning and Episode-based Profit Sharing (EPS) are well known methods for this problem. HQ-learning divides a POMDP environment into subtasks. EPS distributes same reward to state-action pairs in the episode when an agent achieves a goal. However, these methods have disadvantages in learning efficiency and localized solutions. In this paper, we propose a hybrid learning method combining PS and genetic algorithm. We also report the effectiveness of our method by some experiments with partially observable mazes.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!