2013 年 28 巻 3 号 p. 267-272
This paper describes optimization of the betting fraction parameter in compound reinforcement learning. Compound reinforcement learning maximizes the expected logarithm of compound returns in return-based MDPs. However, a new betting fraction parameter is introduced in order not to diverge values to negative infinity and it causes a problem of choosing the parameter. In this paper, we proposed a method to optimize the betting fraction with on-line gradient ascent in compound reinforcement learning.