Acceleration of Reinforcement Learning by Estimating State Transition Probability Model

Shinji FUJII; Kei SENDA; Syusuke MANO

doi:10.9746/sicetr1965.42.47

Abstract

The Q-learning is one of typical reinforcement learning methods. Since the Q-learning requires huge amounts of time to solve a problem, this study proposes acceleration methods. This study introduces two approaches based on iteration methods of the dynamic programming to accelerate the learning. One is to use Robbins-Monro estimation of the state transition probability model. The other is application of iterative solving methods for an inverse matrix, e.g., Jacobi's method, Gauss-Seidel's method, SOR method, etc. Those allow us to determine an appropriate learning factor. Numerical simulations show that the proposed methods are more efficient than the Q-learning.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!