Host: The Japanese Society for Artificial Intelligence
Name : The 32nd Annual Conference of the Japanese Society for Artificial Intelligence, 2018
Number : 32
Location : [in Japanese]
Date : June 05, 2018 - June 08, 2018
As the domains of reinforcement learning become more complicated and realistic, standard optimization algorithms may not work well. In this paper we introduce a simple mathematical model called RS (reference satisficing) that implements a satisficing strategy that look for actions with values above the aspiration level. We apply it to K-armed bandit problems. If there are actions with values above the aspiration level, we theoretically show that RS is guaranteed to find these actions. Also, if the aspiration level is set to an ''optimal level'' so that satisficing practically ends up optimizing, we prove that the regret (the expected loss) is upper bounded by a finite value. We confirm these results by simulations, and clarify the effectiveness of RS through comparison with other algorithms.