Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
38th (2024)
Session ID : 4D3-GS-2-04
Conference information

A Contextual Bandit Algorithm Method Using Decision Trees and UCB
*Sho OIWATaichi ABEKeigo KIMURASatoshi SUZUKIMasayuki GOTO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Contextual Bandit Algorithm, an online recommendation system, makes sequential recommendations based on contextual information such as user attributes and past purchase history, and estimates rewards that indicate whether or not a product will be purchased. In doing so, the method balances exploration, which recommends various products, and utilization, which recommends products with high expected rewards, to maximize the accumulated reward. To accurately estimate rewards from context, it is essential to assume a model suitable for the situation, but often a nonlinear relationship between context and reward is observed. TreeBootstrap, which uses decision trees to estimate rewards, has been proposed as a method suitable for such situations. However, since TreeBootstrap balances search and utilization by bootstrapping the training data, it may not be able to use the information obtained from search sufficiently, especially in utilization, and the cumulative reward may not be sufficient. In this study, we propose TreeUCB, which balances exploration and exploitation by using a confidence upper bound of expected reward, instead of bootstrap sampling of training data. We demonstrate the effectiveness of the proposed method through experiments using artificial and real data.

Content from these authors
© 2024 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top