2017 Volume 137 Issue 12 Pages 1591-1599
Reinforcement learning is generally performed in the Markov decision processes (MDP). However, there is a possibility that the agent can not correctly observe the environment due to the perception ability of the sensor. This is called partially observable Markov decision processes (POMDP). In a POMDP environment, an agent may observe the same information at more than one state. HQ-learning and Episode-based Profit Sharing (EPS) are well known methods for this problem. HQ-learning divides a POMDP environment into subtasks. EPS distributes same reward to state-action pairs in the episode when an agent achieves a goal. However, these methods have disadvantages in learning efficiency and localized solutions. In this paper, we propose a hybrid learning method combining PS and genetic algorithm. We also report the effectiveness of our method by some experiments with partially observable mazes.
The transactions of the Institute of Electrical Engineers of Japan.C
The Journal of the Institute of Electrical Engineers of Japan