2006 年 126 巻 1 号 p. 72-82
We propose a method to improve the performance of R-learning, a reinforcement learning algorithm, by using multiple state-action value tables. Unlike Q- or Sarsa learning, R-learning learns a policy to maximize undiscounted rewards. Multiple state-action value tables cause substantial explorations as needed and make R-learnings to work well. Efficiency of the proposed method is verified through experiments in simulation environment.
J-STAGEがリニューアルされました! https://www.jstage.jst.go.jp/browse/-char/ja/