認知的満足化関数の線形関数近似 文脈付きバンディット問題への対応

甲野 佑

doi:10.11517/pjsai.JSAI2019.0_3K4J205

Abstract

Both Recommendation and foraging behavior of animals are aiming to maximizing rewards through trial and error. By contrast, Maximizing reward is difficult in a complex actual world that is extremely complicated. So, The decision-making agents is considered to give priority to whether or not to achieve a specific purpose. In addition, they aim to achieve the desire level with as little information as possible. The decision-making tendency where is owned intelligent lives is called "satisficing". The RS algorithm to make choices for "satisficing" was focused in this paper, further LinRS adapted to linear approximation function was devised so that the scope of the problem is expanded to be more widely applicable. In consequence, RS became enabled to cope with the contextual-bandit problem where has application examples such as advertisement delivery. Moreover LinRS compared with familiar existing selection algorithms in simulation. The linear function approximation of LinRS realized in this study is the first step to apply a fast and efficient search algorithm by using RS that emphasizes achievement of purpose to deep reinforcement learning.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!