強化学習におけるUCB行動選択手法の効果

斉藤 晃貴; 野津 亮; 本多 克宏

doi:10.14864/fss.30.0_174

30th Fuzzy System Symposium

Session ID : MD2-2

DOI https://doi.org/10.14864/fss.30.0_174

Conference information

Host: Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)

main

The Effect of UCB Algorithm in Reinforcement Learning

*Koki Saito, Akira Notsu, Katsuhiro Honda

Author information

Keywords: Reinforcement learning, UCB algorithm, Q-learning

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

UCB algorithm was proposed as one of the action choice methods used in a multi-armed bandit problem. In this method, an agent chooses the action by comparing upper bound of conﬁdence intervals of estimated values, thereby it has a better performance than others, like ε-greedy. In this paper, we proposed the method to apply UCB algorithm to Q-learning, and experimentally evaluated its performance by the shortest path problem in the continuous state spaces.

Corresponding author

Register with J-STAGE for free!