電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<ソフトコンピューティング・学習>
方策に関する知識を分離した方策こう配法
―環境ダイナミクスと行動価値による方策表現―
石原 聖司五十嵐 治一
著者情報
ジャーナル フリー

2016 年 136 巻 3 号 p. 282-289

詳細
抄録

The knowledge concerning an agent's policies consists of two types: the environmental dynamics for defining state transitions around the agent, and the behavior knowledge for solving a given task. However, these two types of information, which are usually combined into state-value or action-value functions, are learned together by conventional reinforcement learning. If they are separated and learned respectively, we might be able to transfer the behavior knowledge to other environments and reuse or modify it. In our previous work, we presented appropriate rules of learning using policy gradients with an objective function, which consists of two types of parameters representing the environmental dynamics and the behavior knowledge, to separate the learning for each type. In the learning framework, state-values were used as reusable parameters corresponding to the behavior knowledge. Instead of state-values, this paper adopts action-values as parameters in the objective function of a policy and presents learning rules by the policy gradient method for each of the separated knowledge. Simulation results on a pursuit problem showed that such parameters can also be transferred and reused more effectively than the unseparated knowledge.

著者関連情報
© 2016 電気学会
前の記事 次の記事
feedback
Top