人工知能学会第二種研究会資料
Online ISSN : 2436-5556
Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
Takayuki AkiyamaHirotaka HachiyaMasashi Sugiyama
著者情報
研究報告書・技術報告書 フリー

2009 年 2009 巻 DMSM-A901 号 p. 01-

詳細
抄録

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot.

著者関連情報
© 2009 著作者
次の記事
feedback
Top