「待った」の概念を取り入れた効率的なオセロの学習

成田 穂; 木村 大毅

doi:10.11517/pjsai.JSAI2019.0_4O3J701

Abstract

Combination of Monte Carlo Tree Search (MCTS) and deep reinforcement learning represented as methods such as AlphaZero has achieved incredible performance, while it requires high computation resources and much training time. In this study, we propose a novel MCTS-based algorithm, where we introduce ``failure rate'' to facilitate efficient exploration and hence it shortens training time. This algorithm makes the agent prioritize the exploration of the states that are important to winning. Our method has outperformed AlphaZero in the first few iterations.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Conference information

Register with J-STAGE for free!