Proceedings of the Fuzzy System Symposium
37th Fuzzy System Symposium
Session ID : MD1-3
Conference information

proceeding
Deep Reinforcement Learning Combined with Approximation of Number of State Experiences
*Kyohei YasunagaAkira NotsuSeiki UbukataKatsuhiro Honda
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In action selection policy during deep reinforcement learning, it is possible to balance exploration and exploitation efficiently by considering the selection frequency of state action pairs. However, when the similarity of states is also learned in parallel, it is difficult to accurately count how many times each state has been reached in the past. In this paper, we propose a new method to estimate the value of each state in consideration of the balance between exploration and exploitation by constructing a network which estimates only whether or not the state has been reached in the past but has no reward. The frequency of state reached should simply increase as learning progresses, so we set such a function. The policy takes into account the mean and variance of the beta distribution constructed from reward values and their experience values. The effectiveness of the proposed method is confirmed by numerical experiments.

Content from these authors
© 2021 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top