IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<Intelligence, Robotics>
Improving Q-learning by Using the Agent's Action History
Masanori SaitoTeruji Sekozawa
Author information
JOURNAL FREE ACCESS

2016 Volume 136 Issue 8 Pages 1209-1217

Details
Abstract

Q-learning is learning the optimal policy by updating in action-state value function(Q-value) to maximize a expectation reward by a trial and error search. However, there is major issues slowness of learning speed. Therefore, we added technique agent memorize environmental information and useing with update of the Q-value in many states. By updating the Q-value in the number of conditions to give a lot of information to the agent, be able to reduce learning time. Further, by incorporating the stored environmental information into action selection method, and the action selection to avoid the failure behavior, such as learning to stagnation, improved the learning speed of learning the initial stage. In addition, we design a new action area value function, in order to search for much more statas from the learning initial. Finally, numerical examples which solved maze problem showed the usefulness of the proposed method.

Content from these authors
© 2016 by the Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top