Abstract
This paper proposes a new reinforcement learning method based on “Motivation Value” which changes action selection probabilities in order to realize policy which depends on state-action context. The motivation value which this paper defines is a parameter which emphasizes (or de-emphasizes) specific action selection probabilities temporarily, and controls the next action selection probability indirectly at the control phase. Furthermore motivation value is recorded using the form corresponding to each state-action pair like an Q-value and updated with advance of learning. The feature of the method to propose is a practical advantage to be implemented by the comparatively easy extension from general reinforcement learning. In order to investigate the validity of proposed method, the method was applied to the maze problem containing perceptual aliasing problem. Experimental results show this method is effective technique as a learning algorithm under the non-Markov decision process environment which contains perceptual aliasing problems.