電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<知能,ロボティクス>
未探索領域を拡大する未探索冒険型Q-learningによる準最短経路獲得
河原崎 俊之祐瀬古沢 照治
著者情報
ジャーナル フリー

2018 年 138 巻 7 号 p. 941-949

詳細
抄録

Q-learning methods evaluate and update action values using information on rewards obtained. Since the Q value can not be updated until the learning succeeds and the reward is obtained, there is no index for learning, which causes a problem of requiring much time for learning. In cases, the route with no spread in the maze where the probability that learning fails is high is the semi shortest route from the start to the goal, the semi shortest route can not be learned.

To learn the optimal actions and discover the semi shortest path, it is essential to experience a large number of unknown states at early stages of the learning process. To this end, in this work we propose unknown-adventure Q-learning, in which agents maintain an action history and adventurously seek out unknown states that have not yet been recorded in this history. When unknown states are present, the agent proceeds boldly and adventurously to search these states without fear of failure. Our unknown-adventure Q-learning experiences large numbers of states at early stages of the learning process, ensuring that actions may be selected in a way that avoids previous failures.

This enables a massive acceleration of the learning process in which the number of episodes required to learn a path from start to goal is reduced 100-fold compared to the original Q-learning method. Moreover, our method is capable of discovering the semi shortest-length path through a maze even in cases where that path does not expand through the maze, a case in which learning failures are common and in which the semi shortest path cannot be discovered by methods that use V-filters or action-region valuations to accelerate learning by emphasizing prior knowledge.

著者関連情報
© 2018 電気学会
前の記事 次の記事
feedback
Top