IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<Softcomputing, Learning>
Behavior Learning Based on a Policy Gradient Method: Separation of Environmental Dynamics and State-Values in Policies
Seiji IshiharaHarukazu Igarashi
Author information
JOURNAL FREE ACCESS

2009 Volume 129 Issue 9 Pages 1737-1746

Details
Abstract

Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.

Content from these authors
© 2009 by the Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top