Discounted UCB1-tunedのQ学習への適用

野津 亮; 本多 克宏

doi:10.3156/jsoft.26.913

Abstract

In this paper, we integrated Discounted UCB1-tuned, which uses weighted value and weighted variance, into Q-learning agents and experimentally evaluated its performance. Discounted UCB1-tuned is an optimized selection method that balances exploration and exploitation and outperforms other methods, including ε-greedy. We conducted experiments on the effect of default values and learning rate in a multi-armed bandit problem. Our algorithm selects actions its value is not updated or with the highest UCB value in updatable state-actions. We show the results of the continuous state spaces shortest path problem followed by a discussion.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!