非定常な文脈付きバンディット問題における目的志向探索

伊東 将吾; 水野 桜; 坪谷 朱音; 高橋 達二; 甲野 佑

doi:10.11517/pjsai.JSAI2023.0_3R5GS202

Abstract

Selection algorithms for ad delivery and recommender systems have become an indispensable part of Web services. Since the tastes and preferences of people are fluid, to be able to follow them in non-stationary environments is important for those algorithms. We focused on a human decision-making tendency, that is, the tendency to give greater importance to achieving some goal rather than achieving optimization. Agents with this target-oriented tendency are expected to make flexible and highly followable decisions, because they explore according to the degree of achievement without being too sensitive to the changes in the environment. Risk-sensitive Satisficing (RS) is a meta-policy that incorporates target-oriented decision making. Hanayasu et al. showed that RS has excellent followability in non-stationary environments. However, it has not been verified whether it keeps similar followability in non-stationary environments in contextual bandit problems. We used Regional Linear Risk-sensitive Satisficing (RegLinRS), which is an extension of RS to approximate functions, to verify the followability in the environment, and showed the usefulness of RegLinRS.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!