部分観測マルコフ決定過程における位置ベクトルを用いた強化学習手法の提案

清本 盛明; 亀井 且有

doi:10.5687/iscie.14.86

Abstract

This paper describes a reinforcement learning with a position vector, which does not fall into Partially Observable Markov Decision Process (POMDP). Firstly, a rule structure using the position vector as agent's inside sensory information and a restraint of reward assignment for detours are described and then a new reinforcement learning method composed of them is proposed. Next, the proposed method is compared with a conventional method for relatively simple Partial Observation Markov Environment (POME). As a result, it is shown that the reward assignment to unnecessary rules is restrained, that is, the rewards are given to only effective rules and then an efficient learning is carried out. In addition, we apply the proposed method to the shortest path acquisition problem of POME which can hardly be solved by the conventional method, and obseve that an optimum solution is obtained by the proposed method. Finally, the proposed method is successfully applied to a huge maze used in Japan micro-mouse competition, which shows that the proposed method is effective for such realistic problems.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!