Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
36th (2022)
Session ID : 2O5-GS-5-05
Conference information

Learning Method in Monte Carlo Softmax Search: Reinforcement Learning of State Evaluation Function by Sampling
*Kanau KUMEKAWAHiromasa IWAMOTOHarukazu IGARASHITooru SUGIMOTO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In two-player games such as Shogi, MC Softmax search, which is one of the selective search methods, and a learning method of state evaluation functions have been proposed by Igarashi et al. in 2018. The gradient vectors of action/state values with respect to learning parameters are efficiently computed by sampling along the search tree. This makes it possible to use all nodes as training data, and multiple reinforcement learning methods can be executed simultaneously. In this study, we showed that this method is not limited to the framework of two-player games, but can be extended to general agent learning problems. We also proposed a search-and-learn method in which both searching trees and learning evaluation functions are executed simultaneously. In addition, we applied the proposed method to a simple maze escape example to verify the algorithm.

Content from these authors
© 2022 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top