広範なデータへ動的に対応する目的志向探索

水野 桜; 伊東 将吾; 坪谷 朱音; 高橋 達二; 甲野 佑

doi:10.11517/pjsai.JSAI2023.0_3R5GS204

Abstract

Reinforcement learning is weak to real-world noise and difficult to adapt to the gap between simulation and reality. This problem is famous in motion control tasks and is also remarkably seen in contextual bandit problems used in recommendation systems. Contextual bandit problems require a linear approximation of the target feature, but some algorithms that perform well on artificial data may not be effective for noisy real-world data. Humans adapt dynamically to complex real-world environments with limited data sampling by prioritizing trial and error aimed at reaching a certain aspiration level, rather than optimization. Risk-sensitive Satisficing (RS) is a target-oriented algorithm that includes such human cognitive tendencies.In the contextual bandit problem, RS has been suggested to perform well not only on artificial data but also on real-world data. However, it was necessary to have a certain adoption weighting rate for a prior distribution as a parameter in fitting real-world data. In this study, we tested the possibility of quickly and flexibly adapting to a wider range of data By introducing a meta-algorithm that dynamically determines the adoption weighting rate.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!