複利型強化学習

松井 藤五郎

doi:10.1527/tjsai.26.330

抄録

This paper describes a reinforcement learning framework based on compound returns, which is called compound reinforcement learning. Compound reinforcement learning maximizes the compound return in returns-based MDPs. We also describe compound Q-learning algorithm. We present experimental results using an ilustrative example, 2-armed bandit.

著者関連情報

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

Comparative Macroanatomic Investigations on the Formation of the External Iliac Vein in Akkaraman Sheep and Angora Goat
Mental Health Problems among Undergraduates in Fukushima, Tokyo, and Kyoto after the March 11 Tohoku Earthquake
Population differentiation of the endangered salt-marsh snail Ellobium chinense in Japan (Gastropoda: Ellobiidae)

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）