状態価値に確率分布を用いた強化学習

佐藤 亘; 橘 完太

doi:10.14864/fss.30.0_180

Abstract

Reinforcement learning is a method to learn the optimal behavior through trial and error in an unknown environment. If the environment is strongly non-stationary, the agent takes a long time to learn the optimal behavior. There have been various studies in order to solve this problem. As far as we know, these methods have structure which consists of recognition of environmental change and response to environment. In the conventional method, agent has sensor to cognition environmental change and switch the optimal behavior and the exploring behavior. In our method, the optimal behavior and the exploring behavior can be decided according to probability distribution by Bayesian updating state values of probability distribution.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!