複利型強化学習における投資比率の最適化

松井 藤五郎; 後藤 卓; 和泉 潔; 陳 ユ

doi:10.1527/tjsai.28.267

抄録

This paper describes optimization of the betting fraction parameter in compound reinforcement learning. Compound reinforcement learning maximizes the expected logarithm of compound returns in return-based MDPs. However, a new betting fraction parameter is introduced in order not to diverge values to negative infinity and it causes a problem of choosing the parameter. In this paper, we proposed a method to optimize the betting fraction with on-line gradient ascent in compound reinforcement learning.

著者関連情報

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）