Abstract
Most applications of reinforcement learning are based on off-line simulations with models because they needs long time to learn in real environment. But, the models have some errors or variations from the real environment. Hense, the learning results must be modified by on-line learning in real environments. Therefore, it is important to learn the modifications by fewer computational efforts. For the purpose, this paper proposes a Partial Modification Algorithm (PMA) based on Sherman-Morrison formula, which is related to partial inverse matrix computations. The PMA implements efficient modification in learning for variations in cost or state transition probability or action selection probability. We apply the proposed method to some problems associated with optimal controls. Numerical simulations show that the proposed method is more effective than existing methods.