日本神経回路学会誌
Online ISSN : 1883-0455
Print ISSN : 1340-766X
ISSN-L : 1340-766X
研究論文
荷重報酬和モデルで表されるタスク族に対する一括強化学習法
平岡 和幸三島 健稔
著者情報
ジャーナル フリー

2006 年 13 巻 4 号 p. 137-145

詳細
抄録

Unlike ordinary reinforcement learning (RL) for a single task, RL for a family of tasks is desired in time-varying environments, multi-criteria problems, and inverse RL. In the present paper, a family of tasks is defined as weighted sum of partial rewards, and a parallel learning method is proposed for this family. Expected reward of the optimal policy is not linear in this case; it is a piecewise-linear convex function of weight values. Calculation of convex hulls and Minkowski sums realizes parallel Q-learning for all possible weight values at once, in spite of their infinite variations.

著者関連情報
© 2006 日本神経回路学会
前の記事 次の記事
feedback
Top