2017 Volume 29 Issue 1 Pages 507-516
In this paper, we reconsider the behavior policy and the value estimation from the point of view of Bayesian approach in the reinforcement learning, to devise a new algorithm based on Prospect Theory. We realize that good behavior is selected by probability distribution criteria based on Bayesian estimation, and thereby it can achieve superior learning in terms of search efficiency than the conventional method. Estimated value distribution functions are represented by a beta distribution, and behavior selection policy is carried out by evaluating their mean and variance. Two parameters of this beta distribution consist of reward and weighted parameters of the now and the next state for each positive and negative one, then they are updated like Q-learning. Reinforcement learning becomes possible by being updated on the basis of the prospect theory in order to correspond to the state transition. Each initial probability distribution is a uniform distribution. It is revealed that an advantage of the proposed method is the breadth of its search in a discrete space path problem. It is also showed that applicability to more complicated problem by continuous space path search problem.