Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4Xin2-68
Conference information

Posterior Tracking Algorithm for Multi Objective Thresholding Bandits
*Riku SUZUKIAtsuyoshi NAKAMURA
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Multi-objective thresholding bandits aim to identify all the good arms by repeatedly selecting one arm from given set of K arms at each time to observe multi-dimensional rewards. Here, an arm is said to be good if its expected reward of each dimension is no less than its specified threshold of the dimension. In fixed confidence setting, we show the optimal allocation of each arm drawn which achieves asymptotic lower bound in this problem, and present the expression of generalized likelihood ratio statistics used for the stopping condition. We apply them and the algorithm, named P-Tracking, based on posterior sampling to this problem. We verify the effectiveness of P-Tracking by using artificial data. Through experimental comparison against C-Tracking and D-Tracking, which conduct fixing the expected reward estimation by forced exploration in stead of posterior sampling to explore for correct answer search, and naive multi-dimensional extension of HDoC, which is effective in one-dimensional reward thresholding bandits, we show that P-Tracking identifies all good arms from averagely fewer samples.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top