最適でない方策の学習のための動的な報酬設定の提案

岡野 光稀; 西野 順二

doi:10.14864/fss.37.0_661

Abstract

In the field of game AI research in recent years, there has been a great deal of work on AI with diverse goals that differ from the optimal game solution (diverse game AI). While diverse game AI has a lot of potential in terms of practical application and understanding of human intelligence, it also has a problem of making gameplay goals more complex. In this paper, we propose a method to dynamically set rewards for learning non-optimal strategies in order to improve the efficiency of diverse game AI development. In this method, the reward settings are updated to restrict the learning of the same strategy as in the past, thereby increasing the probability of learning a non-optimal strategy. We show the results of experiments on learning paths in a maze map with multiple goals, and discuss the effects and challenges of the proposed method.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!