Host: The Japanese Society for Artificial Intelligence
Name : The 32nd Annual Conference of the Japanese Society for Artificial Intelligence, 2018
Number : 32
Location : [in Japanese]
Date : June 05, 2018 - June 08, 2018
Reinforcement learning is generally performed in the Markov decision processes (MDP). However, there is a possibility that the agent cannot correctly observe the environment due to the perception ability of the sensor. This is called partially observable Markov decision processes (POMDP). In a POMDP environment, an agent may observe the same information at more than one state. We proposed a hybrid learning method using Profit Sharing and genetic algorithm (HPG) for this problem.However, Most of real problems can be represented in an MDP environments. In this paper, we improve HPG to adapt to MDPs environments and report the effectiveness of our method by some experiments with mazes.