方策こう配法を用いた行動学習—環境のダイナミクスと行動知識との分離—

石原 聖司; 五十嵐 治一

doi:10.1541/ieejeiss.129.1737

Abstract

Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!