ロボティクス・メカトロニクス講演会講演概要集
Online ISSN : 2424-3124
セッションID: 1P2-A13
会議情報

双曲割引型強化学習の提案
*小林 泰介
著者情報
会議録・要旨集 認証あり

詳細
抄録

This paper proposes reinforcement learning with hyperbolic discounting. In general, return and its expectation, i.e., value function, are defined as cumulative rewards with exponential discounting due to mathematical simplicity. Animals, however, show behaviors that cannot be explained by the exponential discounting, but can be explained by the hyperbolic discounting. There is therefore no doubt that some profits can be obtained by changing the exponential to hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, which is also biologically plausible, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learning, although the performance depends on the design of reward and punishment. In addition, the averages of discount factors w.r.t reward and punishment are different from each other, like a sign effect in animal behaviors.

著者関連情報
© 2019 一般社団法人 日本機械学会
前の記事 次の記事
feedback
Top