主成分分析による方策パラメータの低次元化を用いた直接方策探索の提案

村田 悠稀; 宮下 恵; 矢野 史朗; 近藤 敏之

doi:10.11517/pjsai.JSAI2018.0_3Pin108

Abstract

In the sampling based direct policy search in reinforcement learning, higher dimensional decision variables causes the deterioration of optimal value and the slowing down of the learning speed. We clarified that the variance of the sampling probability distribution affects both for the optimal value and the learning speed. Especially, there exists the tradeoff between the optimal value and the learning speed. In this paper, we propose two trick to improve the learning speed without deteriorating the optimal value. First trick is to employ the small variance sampling distribution for improving the optimal value; It causes slower convergence as a side effect. As the second trick, we employed the dimensionality reduction of the decision variable for improving the learning speed.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!