自己対戦型深層強化学習における探索結果の利用

神子島 一弥; 野田 五十樹; 小山 聡

doi:10.11517/pjsai.JSAI2023.0_2D4GS202

Abstract

We propose a new method for training data generation in self-play deep reinforcement learning, which are widely used in Game-AI like AlphaGoZero, AlphaZero, and so on. Generally, such self-play learning has not utilized most of search results that are generated in self-play. Currently, few researches try to make use of them. The proposed method converts the search result to training data by estimating final win/lose rewards and policy for it. The experimental investigation with various hyperparameters for the training suggests that the proposed method will help learning the policy effectively and stabilize the training.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!