日本ロボット学会誌
Online ISSN : 1884-7145
Print ISSN : 0289-1824
ISSN-L : 0289-1824
環境同定と報酬獲得のトレードオフを解消する報酬・嫌悪の二次元評価強化学習の提案
岡田 浩之山川 宏大森 隆司
著者情報
ジャーナル フリー

2001 年 19 巻 2 号 p. 244-251

詳細
抄録
The trade-off of exploration and exploitation is present for a learnig method based on the trial and error such as reinforcement learning. We have proposed a reinforcement learning algorism using reward and punishment as repulsive evaluation (2D-RL) . In the algorithm, an appropriate balance between exploration and exploitation can be attained by using interest and utility. In this paper, we applied the 2D-RL to a navigation learning task of mobile robot, and the robot found a better path in real world by 2D-RL than by traditional actor-critic model.
著者関連情報
© 社団法人 日本ロボット学会
前の記事 次の記事
feedback
Top