JSAI Technical Report, Type 2 SIG
Online ISSN : 2436-5556
Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
Takayuki AKIYAMAHirotaka HACHIYAMasashi SUGIYAMA
Author information
RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

2009 Volume 2009 Issue DMSM-A901 Pages 01-

Details
Abstract

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot.

Content from these authors
© 2009 Authors
Next article
feedback
Top