Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
Reinforcement learning is weak to real-world noise and difficult to adapt to the gap between simulation and reality. This problem is famous in motion control tasks and is also remarkably seen in contextual bandit problems used in recommendation systems. Contextual bandit problems require a linear approximation of the target feature, but some algorithms that perform well on artificial data may not be effective for noisy real-world data. Humans adapt dynamically to complex real-world environments with limited data sampling by prioritizing trial and error aimed at reaching a certain aspiration level, rather than optimization. Risk-sensitive Satisficing (RS) is a target-oriented algorithm that includes such human cognitive tendencies.In the contextual bandit problem, RS has been suggested to perform well not only on artificial data but also on real-world data. However, it was necessary to have a certain adoption weighting rate for a prior distribution as a parameter in fitting real-world data. In this study, we tested the possibility of quickly and flexibly adapting to a wider range of data By introducing a meta-algorithm that dynamically determines the adoption weighting rate.