電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<ソフトコンピューティング・学習>
方策こう配法を用いた行動学習—環境のダイナミクスと行動知識との分離—
石原 聖司五十嵐 治一
著者情報
ジャーナル フリー

2009 年 129 巻 9 号 p. 1737-1746

詳細
抄録

Policy gradient methods are useful approaches to reinforcement learning. Applying the method to behavior learning, we can deal with each decision problem in different time-steps as a problem of minimizing an objective function. In this paper, we give the objective function consists of two types of parameters, which represent state-values and environmental dynamics. In order to separate the learning of the state-value from that of the environmental dynamics, we also give respective learning rules for each type of parameters. Furthermore, we show that the same set of state-values can be reused under different environmental dynamics.

著者関連情報
© 電気学会 2009
前の記事 次の記事
feedback
Top