Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Multi-objective thresholding bandits aim to identify all the good arms by repeatedly selecting one arm from given set of K arms at each time to observe multi-dimensional rewards. Here, an arm is said to be good if its expected reward of each dimension is no less than its specified threshold of the dimension. In fixed confidence setting, we show the optimal allocation of each arm drawn which achieves asymptotic lower bound in this problem, and present the expression of generalized likelihood ratio statistics used for the stopping condition. We apply them and the algorithm, named P-Tracking, based on posterior sampling to this problem. We verify the effectiveness of P-Tracking by using artificial data. Through experimental comparison against C-Tracking and D-Tracking, which conduct fixing the expected reward estimation by forced exploration in stead of posterior sampling to explore for correct answer search, and naive multi-dimensional extension of HDoC, which is effective in one-dimensional reward thresholding bandits, we show that P-Tracking identifies all good arms from averagely fewer samples.