双曲割引型強化学習の提案

小林 泰介

doi:10.1299/jsmermd.2019.1P2-A13

抄録

This paper proposes reinforcement learning with hyperbolic discounting. In general, return and its expectation, i.e., value function, are defined as cumulative rewards with exponential discounting due to mathematical simplicity. Animals, however, show behaviors that cannot be explained by the exponential discounting, but can be explained by the hyperbolic discounting. There is therefore no doubt that some profits can be obtained by changing the exponential to hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, which is also biologically plausible, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learning, although the performance depends on the design of reward and punishment. In addition, the averages of discount factors w.r.t reward and punishment are different from each other, like a sign effect in animal behaviors.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

会員向け購読者番号とパスワードは以下URLよりご確認下さい。
https://www.jsme.or.jp/publication/proceedings/

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）