JSAI Technical Report, SIG-FPAI
Online ISSN : 2436-4584
96th (Jan, 2014)
Conference information

An Extension of UCB to the Stochastic Multi-armed Bandits with Action-dependent Processing Time
Ryo WATANABEAtsuyoshi NAKAMURAMineichi KUDO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Pages 06-

Details
Abstract

Stochastic K-armed bandits tries to maximize his cumulative reward in limited number of plays. In this paper, we consider the variant of stochastic K-armed bandits that has action-dependent processing time. For this problem, we propose the policy N-UCB (Normalized UCB), the extension of well-known policy UCB, and shows some fundamental results of its regret analysis.

Content from these authors
© 2015 The Japaense Society for Artificial Intelligence
Previous article Next article
feedback
Top