Journal of Japan Society for Fuzzy Theory and Intelligent Informatics
Online ISSN : 1881-7203
Print ISSN : 1347-7986
ISSN-L : 1347-7986
Original Papers
Beta Distribution Propagating Reinforcement Learning Based on Prospect Theory for the Efficient Exploration and Exploitation
Akira NOTSUSeiki UBUKATAKatsuhiro HONDA
Author information
JOURNAL OPEN ACCESS

2017 Volume 29 Issue 1 Pages 507-516

Details
Abstract

In this paper, we reconsider the behavior policy and the value estimation from the point of view of Bayesian approach in the reinforcement learning, to devise a new algorithm based on Prospect Theory. We realize that good behavior is selected by probability distribution criteria based on Bayesian estimation, and thereby it can achieve superior learning in terms of search efficiency than the conventional method. Estimated value distribution functions are represented by a beta distribution, and behavior selection policy is carried out by evaluating their mean and variance. Two parameters of this beta distribution consist of reward and weighted parameters of the now and the next state for each positive and negative one, then they are updated like Q-learning. Reinforcement learning becomes possible by being updated on the basis of the prospect theory in order to correspond to the state transition. Each initial probability distribution is a uniform distribution. It is revealed that an advantage of the proposed method is the breadth of its search in a discrete space path problem. It is also showed that applicability to more complicated problem by continuous space path search problem.

Content from these authors
© 2017 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top