Host: The Japanese Society for Artificial Intelligence
Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 37
Location : [in Japanese]
Date : June 06, 2023 - June 09, 2023
We propose a new method for training data generation in self-play deep reinforcement learning, which are widely used in Game-AI like AlphaGoZero, AlphaZero, and so on. Generally, such self-play learning has not utilized most of search results that are generated in self-play. Currently, few researches try to make use of them. The proposed method converts the search result to training data by estimating final win/lose rewards and policy for it. The experimental investigation with various hyperparameters for the training suggests that the proposed method will help learning the policy effectively and stabilize the training.