確率的方策に基づいた自然強化学習

鈴木 匠海; 越川 駿平; 高橋 達二; 甲野 佑

doi:10.11517/pjsai.JSAI2023.0_1B4GS204

Abstract

In recent years, much attention has been given to deep reinforcement learning, which is one of the artificial intelligence technologies that combines reinforcement learning and deep learning. Deep reinforcement learning, for example, has already shown better performance than humans in games such as Go and Atari video games. Whereas, the progress of its application to real-world tasks beyond artificially limited environments has been slow, and this fact may mean the necessity of other approaches. We focused, in this study, on natural reinforcement learning, which sets an aspiration level and finds quality in rewards. Risk-sensitive Satisficing (RS), an algorithm for natural reinforcement learning, has already demonstrated certain target-oriented exploration and its efficiency in table-based reinforcement learning. However, the current RS employs a Deterministic policy, meaning the difficulty of its application to using probability distributions which deep reinforcement learning draws on. In this study, we extended the Deterministic policy to a Stochastic policy, and verified whether its performances are as good as those of existing table-based reinforcement learning tasks.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!