This paper describes a reinforcement learning with a position vector, which does not fall into Partially Observable Markov Decision Process (POMDP). Firstly, a rule structure using the position vector as agent's inside sensory information and a restraint of reward assignment for detours are described and then a new reinforcement learning method composed of them is proposed. Next, the proposed method is compared with a conventional method for relatively simple Partial Observation Markov Environment (POME). As a result, it is shown that the reward assignment to unnecessary rules is restrained, that is, the rewards are given to only effective rules and then an efficient learning is carried out. In addition, we apply the proposed method to the shortest path acquisition problem of POME which can hardly be solved by the conventional method, and obseve that an optimum solution is obtained by the proposed method. Finally, the proposed method is successfully applied to a huge maze used in Japan micro-mouse competition, which shows that the proposed method is effective for such realistic problems.
抄録全体を表示