期待強化値を考慮した動的強化関数を用いるProfit Sharing

玉嶋 大輔; 小圷 成一; 岡本 卓; 平田 廣則

doi:10.1541/ieejeiss.129.1339

抄録

Profit Sharing is one of exploitation oriented reinforcement learning methods and aims to adapt a system to a given environment. In Profit Sharing, an agent learns a policy based on the reward that is received from the environment when it reaches a goal state. It is important to design a reinforcement function that distributes the received reward to each action rule in the policy. If the reinforcement function satisfies the ineffective rule suppression theorem, the reinforcement function is able to distribute more reward to effective rules than ineffective ones, even in the worst case where an ineffective rule is infinitely selected. The value of the reinforcement function, however, decreases exponentially with distance from the goal state. As a result, the agent fails to learn an appropriate policy when the episode length from an initial state to the goal state is relatively long. In this paper, we report a new dynamic reinforcement function considering the expected value of reward which is distributed to each rule. Using our reinforcement function, the expected value of reward distributed to the effective rules becomes larger than that to the ineffective ones. Even when the episode length becomes long, a decrease in the value of the reinforcement function is able to be suppressed, and thus the agent is able to learn an appropriate policy. We apply our reinforcement function to Sutton's maze problem, and show its effectiveness.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

【電気学会会員の方】購読している論文誌を無料でご覧いただけます（会員ご本人のみの個人としての利用に限ります）。購読者番号欄にMyページへのログインIDを，パスワード欄に生年月日8ケタ（西暦，半角数字。例：19800303）を入力して下さい。

ダウンロード

論文(PDF)の閲覧方法はこちら
閲覧方法 (389.7K)

前身誌

電気学会論文誌. C

電氣學會雜誌

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）