Host: The Japan Society of Mechanical Engineers
Name : [in Japanese]
Date : June 05, 2019 - June 08, 2019
This paper proposes reinforcement learning with hyperbolic discounting. In general, return and its expectation, i.e., value function, are defined as cumulative rewards with exponential discounting due to mathematical simplicity. Animals, however, show behaviors that cannot be explained by the exponential discounting, but can be explained by the hyperbolic discounting. There is therefore no doubt that some profits can be obtained by changing the exponential to hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, which is also biologically plausible, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learning, although the performance depends on the design of reward and punishment. In addition, the averages of discount factors w.r.t reward and punishment are different from each other, like a sign effect in animal behaviors.