人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
論文
罰回避政策形成アルゴリズムの改良とオセロゲームへの応用
宮崎 和光坪井 創吾小林 重信
著者情報
ジャーナル フリー

2002 年 17 巻 5 号 p. 548-556

詳細
抄録
The purpose of reinforcement learning is to learn an optimal policy in general. However, in 2-players games such as the othello game, it is important to acquire a penalty avoiding policy. In this paper, we focus on formation of a penalty avoiding policy based on the Penalty Avoiding Rational Policy Making algorithm [Miyazaki 01]. In applying it to large-scale problems, we are confronted with the curse of dimensionality. We introduce several ideas and heuristics to overcome the combinational explosion in large-scale problems. First, we propose an algorithm to save the memory by calculation of state transition. Second, we describe how to restrict exploration by two type knowledge; KIFU database and evaluation funcion. We show that our learning player can always defeat against the well-known othello game program KITTY.
著者関連情報
© 2002 JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top