電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<ソフトコンピューティング・学習>
期待強化値を考慮した動的強化関数を用いるProfit Sharing
玉嶋 大輔小圷 成一岡本 卓平田 廣則
著者情報
ジャーナル フリー

2009 年 129 巻 7 号 p. 1339-1347

詳細
抄録

Profit Sharing is one of exploitation oriented reinforcement learning methods and aims to adapt a system to a given environment. In Profit Sharing, an agent learns a policy based on the reward that is received from the environment when it reaches a goal state. It is important to design a reinforcement function that distributes the received reward to each action rule in the policy. If the reinforcement function satisfies the ineffective rule suppression theorem, the reinforcement function is able to distribute more reward to effective rules than ineffective ones, even in the worst case where an ineffective rule is infinitely selected. The value of the reinforcement function, however, decreases exponentially with distance from the goal state. As a result, the agent fails to learn an appropriate policy when the episode length from an initial state to the goal state is relatively long. In this paper, we report a new dynamic reinforcement function considering the expected value of reward which is distributed to each rule. Using our reinforcement function, the expected value of reward distributed to the effective rules becomes larger than that to the ineffective ones. Even when the episode length becomes long, a decrease in the value of the reinforcement function is able to be suppressed, and thus the agent is able to learn an appropriate policy. We apply our reinforcement function to Sutton's maze problem, and show its effectiveness.

著者関連情報
© 電気学会 2009
前の記事 次の記事
feedback
Top