Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 3R5-GS-2-02
Conference information

Target-oriented Exploration in Non-stationary Contextual Bandits
*Shogo ITOSakura MIZUNOAkane TSUBOYATatsuji TAKAHASHIYu KONO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Selection algorithms for ad delivery and recommender systems have become an indispensable part of Web services. Since the tastes and preferences of people are fluid, to be able to follow them in non-stationary environments is important for those algorithms. We focused on a human decision-making tendency, that is, the tendency to give greater importance to achieving some goal rather than achieving optimization. Agents with this target-oriented tendency are expected to make flexible and highly followable decisions, because they explore according to the degree of achievement without being too sensitive to the changes in the environment. Risk-sensitive Satisficing (RS) is a meta-policy that incorporates target-oriented decision making. Hanayasu et al. showed that RS has excellent followability in non-stationary environments. However, it has not been verified whether it keeps similar followability in non-stationary environments in contextual bandit problems. We used Regional Linear Risk-sensitive Satisficing (RegLinRS), which is an extension of RS to approximate functions, to verify the followability in the environment, and showed the usefulness of RegLinRS.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top