Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
35th (2021)
Session ID : 1G2-GS-2a-03
Conference information

Risk-sensitive Satisficing policy with approximate estimation of reliability
*Akane MINAMIYuki YOSHIIYu KONOTatsuji TAKAHASHI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

The development of deep reinforcement learning has enabled learning of continuous state-action space, and the results have been remarkable in such a way enabling computers to surpass humans in playing digital and analog games. However, the problem that it requires a huge number of trials and errors has not been solved. In order to reduce the number of explorative action selections, we focus on an adaptive method called satisficing, which is in stark contrast with optimization. Satisficing leads to quick search for an action that satisfies a certain target level. Risk-sensitive Satisficing (RS) model that was defined based on satisficing in addition to “risk attitudes” based on the selection ratio of actions (representing the uncertainty of the value of actions). RS has been shown to be able to learn the optimal action sequence with a small number of exploration and finitely bound regret in the multi-armed bandit problems with when given some optimal target level. The linear RS (LinRS) is a linear approximation method for the RS, but the approximation for selection ratio of each action has not been sufficiently discussed. In this study, we propose StableLinRS, that is a new way to approximate the selection rate in LinRS. We also show the usefulness of StableLinRS in the contextual bandit problems in comparison with existing methods.

Content from these authors
© 2021 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top