Transactions of the Japanese Society for Artificial Intelligence
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
Technical Papers
Improvements of the Penalty Avoiding Rational Policy Making Algorithm and an Application to the Othello Game
Kazuteru MiyazakiSougo TsuboiShigenobu Kobayashi
Author information
JOURNAL FREE ACCESS

2002 Volume 17 Issue 5 Pages 548-556

Details
Abstract
The purpose of reinforcement learning is to learn an optimal policy in general. However, in 2-players games such as the othello game, it is important to acquire a penalty avoiding policy. In this paper, we focus on formation of a penalty avoiding policy based on the Penalty Avoiding Rational Policy Making algorithm [Miyazaki 01]. In applying it to large-scale problems, we are confronted with the curse of dimensionality. We introduce several ideas and heuristics to overcome the combinational explosion in large-scale problems. First, we propose an algorithm to save the memory by calculation of state transition. Second, we describe how to restrict exploration by two type knowledge; KIFU database and evaluation funcion. We show that our learning player can always defeat against the well-known othello game program KITTY.
Content from these authors
© 2002 JSAI (The Japanese Society for Artificial Intelligence)
Previous article Next article
feedback
Top