Host: The Japanese Society for Artificial intelligence
Name : 96th SIG-FPAI
Number : 96
Location : [in Japanese]
Date : January 13, 2014 - January 14, 2014
Pages 06-
Stochastic K-armed bandits tries to maximize his cumulative reward in limited number of plays. In this paper, we consider the variant of stochastic K-armed bandits that has action-dependent processing time. For this problem, we propose the policy N-UCB (Normalized UCB), the extension of well-known policy UCB, and shows some fundamental results of its regret analysis.