Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Selection algorithms for advertising delivery and recommendation are an indispensable part of Web services. Contextual bandit algorithms are particularly useful to reflect human preferences in existing tasks in recommendation, with advantages such as real-time responsiveness and strength in cold starts. Their combination with reinforcement learning, such as ChatGPT's RLHF tuning, also can allow further adaptation to human preferences. However, in industrial applications, the emphasis is more on quick achievement of specific standards, rather than extensive exploratory environmental adaptation. We therefore focused on target-oriented achievement, which is a human decision-making tendency. A meta-policy that incorporates this tendency is Regional Linear Risk-sensitive Satisficing (RegLinRS). Tsuboya et al. have shown its high performance in environments with linear reward. It can also be expected to achieve high performance in environments with non-linear reward. We developed Neural Regional Risk-sensitive Satisficing (NeuralRegRS), an extension of RegLinRS for complex function approximation, and tested its performance on environments using both artificial and real-world datasets.