Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 2Q4-OS-27b-04
Conference information

Expansion to Gaussian Distributional Rewards in Natural Reinforcement Learning
*Shoma OGAWAShuichi ARIMURATatsuji TAKAHASHIYu KOHNO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

Reinforcement learning, a machine learning approach in which the agent leans behavior through interaction with the environment to maximize reward, has recently been actively studied and made great progress. In particular, bandit algorithms are widely used, for example, in the field of recommender systems including ad serving. But reward maximization in such fields can be difficult due to the complexity and non-stationarity of humans. In such cases, securing a certain level of reward, rather than simply keep aiming at maximization, can be more important. Algorithms in this approach concur with the property of human preferences too, and show excellent performance when the said level is chosen properly. Risk-sensitive Satisficing (RS) incorporates such cognitive tendencies into the search, and RS is a natural reinforcement learning algorithm that aims to achieve a desired level of performance according to a set objective. Although it shows excellent performance in natural reinforcement learning, such as the Bernoulli distribution reward used to determine whether a user clicked on an advertisement or a product, in practical applications, the Bandit problem often deals with continuous-valued rewards such as viewing time. In this study, we examine the performance of RS when applied to the bandit problem with real-valued rewards from a normal distribution, providing some considerations.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top