Abstract
The Q-learning is one of typical reinforcement learning methods. Since the Q-learning requires huge amounts of time to solve a problem, this study proposes acceleration methods. This study introduces two approaches based on iteration methods of the dynamic programming to accelerate the learning. One is to use Robbins-Monro estimation of the state transition probability model. The other is application of iterative solving methods for an inverse matrix, e.g., Jacobi's method, Gauss-Seidel's method, SOR method, etc. Those allow us to determine an appropriate learning factor. Numerical simulations show that the proposed methods are more efficient than the Q-learning.