Journal of Japan Society for Fuzzy Theory and Intelligent Informatics
Online ISSN : 1881-7203
Print ISSN : 1347-7986
ISSN-L : 1347-7986
Original Papers
Discounted UCB1-tuned for Q-Learning
Akira NOTSUKatsuhiro HONDA
Author information
JOURNAL FREE ACCESS

2014 Volume 26 Issue 6 Pages 913-923

Details
Abstract
In this paper, we integrated Discounted UCB1-tuned, which uses weighted value and weighted variance, into Q-learning agents and experimentally evaluated its performance. Discounted UCB1-tuned is an optimized selection method that balances exploration and exploitation and outperforms other methods, including ε-greedy. We conducted experiments on the effect of default values and learning rate in a multi-armed bandit problem. Our algorithm selects actions its value is not updated or with the highest UCB value in updatable state-actions. We show the results of the continuous state spaces shortest path problem followed by a discussion.
Content from these authors
© 2014 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article
feedback
Top