主催: 一般社団法人 日本機械学会
会議名: ロボティクス・メカトロニクス 講演会2019
開催日: 2019/06/05 - 2019/06/08
This paper proposes reinforcement learning with hyperbolic discounting. In general, return and its expectation, i.e., value function, are defined as cumulative rewards with exponential discounting due to mathematical simplicity. Animals, however, show behaviors that cannot be explained by the exponential discounting, but can be explained by the hyperbolic discounting. There is therefore no doubt that some profits can be obtained by changing the exponential to hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, which is also biologically plausible, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learning, although the performance depends on the design of reward and punishment. In addition, the averages of discount factors w.r.t reward and punishment are different from each other, like a sign effect in animal behaviors.