J-STAGE トップ  >  資料トップ  > 書誌事項

人工知能学会論文誌
Vol. 28 (2013) No. 3 論文特集「2012年度全国大会速報論文特集」,一般論文 p. 267-272

記事言語:

http://doi.org/10.1527/tjsai.28.267

速報論文

This paper describes optimization of the betting fraction parameter in compound reinforcement learning. Compound reinforcement learning maximizes the expected logarithm of compound returns in return-based MDPs. However, a new betting fraction parameter is introduced in order not to diverge values to negative infinity and it causes a problem of choosing the parameter. In this paper, we proposed a method to optimize the betting fraction with on-line gradient ascent in compound reinforcement learning.

Copyright © 2013 JSAI (The Japanese Society for Artificial Intelligence)

記事ツール

この記事を共有