Abstract
In a general reinforcement learning problem, a plant, i.e. state transition probabilities, is estimated, and a learning policy for the estimated plant is applied to a real plant. If there is a difference between the estimated plant and the real plant, the obtained policy may not work well for the real plant. In this study, the real plant variation is parameterized by an interpolation of several estimated plants. This study proposes a reinforcement learning method based on estimation of parameter variation, and applies this method to 2-dimensional Peg-in-Hole Task. The effectiveness of the proposed method is demonstrated by numerical and experimental results.