Host: Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)
Reinforcement learning is a method to learn the optimal behavior through trial and error in an unknown environment. If the environment is strongly non-stationary, the agent takes a long time to learn the optimal behavior. There have been various studies in order to solve this problem. As far as we know, these methods have structure which consists of recognition of environmental change and response to environment. In the conventional method, agent has sensor to cognition environmental change and switch the optimal behavior and the exploring behavior. In our method, the optimal behavior and the exploring behavior can be decided according to probability distribution by Bayesian updating state values of probability distribution.