変分学習によるスパース擬似入力ガウス過程方策探索

佐々木 光; 小澤 裕斗; 松原 崇充

doi:10.1299/jsmermd.2018.1A1-C16

Abstract

In this paper, we introduce a policy search reinforcement learning method with a sparse non-parametric policy model. We formulate policy search as a variational learning problem. A sparse pseudo-input Gaussian processes (SPGP) is placed as a prior distribution of the control policy, then a variational lower bound of the expected reward is derived, which is optimized w.r.t. the hyper parameters and the pseudo-input variables. We conducted numerical simulations and real robot experiments, and confirmed the effectiveness of our proposed method.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!