Proceedings of the Fuzzy System Symposium
37th Fuzzy System Symposium
Session ID : WA3-1
Conference information

proceeding
A proposal for dynamic reward setting for learning non-optimal policy
*Koki OkanoJunji Nishino
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In the field of game AI research in recent years, there has been a great deal of work on AI with diverse goals that differ from the optimal game solution (diverse game AI). While diverse game AI has a lot of potential in terms of practical application and understanding of human intelligence, it also has a problem of making gameplay goals more complex. In this paper, we propose a method to dynamically set rewards for learning non-optimal strategies in order to improve the efficiency of diverse game AI development. In this method, the reward settings are updated to restrict the learning of the same strategy as in the past, thereby increasing the probability of learning a non-optimal strategy. We show the results of experiments on learning paths in a maze map with multiple goals, and discuss the effects and challenges of the proposed method.

Content from these authors
© 2021 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top