2025 Volume 37 Issue 1 Pages 501-505
This study applies deep reinforcement learning to the puzzle game Puyo Puyo. Traditional rule-based methods and those utilizing relevance matrices have struggled to construct large chains comparable to those created by top human players. Furthermore, previous studies using deep reinforcement learning have found it difficult to learn complex strategies and have not demonstrated sufficient performance. This study aims to improve the performance of Puyo Puyo AI through deep reinforcement learning, employing parallel actors and prioritized experience replay. Experiments were conducted using a custom-built Puyo Puyo environment to evaluate the proposed method. The results showed that the proposed approach achieved an average maximum chain length of 6.243 and an average score of 33,114, surpassing the performance of previous deep reinforcement learning studies.