処理時間の長短を考慮した確率的多腕バンディット問題へのUCB戦略の拡張

渡辺 僚; 中村 篤祥; 工藤 峰一

doi:10.11517/jsaifpai.96.0_06

96th (Jan, 2014)

DOI https://doi.org/10.11517/jsaifpai.96.0_06

Conference information

Host: The Japanese Society for Artificial intelligence

Name : 96th SIG-FPAI

Number : 96

Location : [in Japanese]

Date : January 13, 2014 - January 14, 2014

An Extension of UCB to the Stochastic Multi-armed Bandits with Action-dependent Processing Time

Ryo WATANABE, Atsuyoshi NAKAMURA, Mineichi KUDO

Author information

CONFERENCE PROCEEDINGS FREE ACCESS

Pages 06-

Details

Abstract

Stochastic K-armed bandits tries to maximize his cumulative reward in limited number of plays. In this paper, we consider the variant of stochastic K-armed bandits that has action-dependent processing time. For this problem, we propose the policy N-UCB (Normalized UCB), the extension of well-known policy UCB, and shows some fundamental results of its regret analysis.

Corresponding author

Conference information

Register with J-STAGE for free!