Journal of Advanced Computational Intelligence and Intelligent Informatics
Online ISSN : 1883-8014
Print ISSN : 1343-0130
ISSN-L : 1883-8014
Regular Papers
Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference
Nobuhiko Yamaguchi
Author information
JOURNAL OPEN ACCESS

2020 Volume 24 Issue 6 Pages 711-718

Details
Abstract

Direct policy search is a promising reinforcement learning framework particularly for controlling continuous, high-dimensional systems. Peters et al. proposed reward-weighted regression (RWR) as a direct policy search. The RWR algorithm estimates the policy parameter based on the expectation-maximization (EM) algorithm and is therefore prone to overfitting. In this study, we focus on variational Bayesian inference to avoid overfitting and propose direct policy search reinforcement learning based on variational Bayesian inference (VBRL). The performance of the proposed VBRL is assessed in several experiments involving a mountain car and a ball batting task. These experiments demonstrate that VBRL yields a higher average return and outperforms the RWR.

Content from these authors

This article cannot obtain the latest cited-by information.

© 2020 Fuji Technology Press Ltd.
Next article
feedback
Top