Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference

Nobuhiko Yamaguchi

doi:10.20965/jaciii.2020.p0711

抄録

Direct policy search is a promising reinforcement learning framework particularly for controlling continuous, high-dimensional systems. Peters et al. proposed reward-weighted regression (RWR) as a direct policy search. The RWR algorithm estimates the policy parameter based on the expectation-maximization (EM) algorithm and is therefore prone to overfitting. In this study, we focus on variational Bayesian inference to avoid overfitting and propose direct policy search reinforcement learning based on variational Bayesian inference (VBRL). The performance of the proposed VBRL is assessed in several experiments involving a mountain car and a ball batting task. These experiments demonstrate that VBRL yields a higher average return and outperforms the RWR.

著者関連情報

この記事は最新の被引用情報を取得できません。

お気に入り & アラート

閲覧履歴

創刊号からの全論文のPDFは
JACIII公式サイトで公開中(無料)
doiリンクをクリック！

責任著者(Corresponding author)

訂正情報

ファンド情報

1.助成機関/事業名: Japan Society for the Promotion of Science

J-STAGEへの登録はこちら（無料）