Host: The Japan Society of Mechanical Engineers
Name : [in Japanese]
Date : June 02, 2018 - June 05, 2018
In this paper, we introduce a policy search reinforcement learning method with a sparse non-parametric policy model. We formulate policy search as a variational learning problem. A sparse pseudo-input Gaussian processes (SPGP) is placed as a prior distribution of the control policy, then a variational lower bound of the expected reward is derived, which is optimized w.r.t. the hyper parameters and the pseudo-input variables. We conducted numerical simulations and real robot experiments, and confirmed the effectiveness of our proposed method.