Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
In two-player games such as Shogi, MC Softmax search, which is one of the selective search methods, and a learning method of state evaluation functions have been proposed by Igarashi et al. in 2018. The gradient vectors of action/state values with respect to learning parameters are efficiently computed by sampling along the search tree. This makes it possible to use all nodes as training data, and multiple reinforcement learning methods can be executed simultaneously. In this study, we showed that this method is not limited to the framework of two-player games, but can be extended to general agent learning problems. We also proposed a search-and-learn method in which both searching trees and learning evaluation functions are executed simultaneously. In addition, we applied the proposed method to a simple maze escape example to verify the algorithm.