In a general reinforcement learning problem, a learning policy for an estimated plant is applied to a real plant. However, if the difference between the two plants is large, the learning policy is not effective. Therefore, a learning policy for a variation plant set, including elements made by adding variations to the estimated plant, is obtained. However, the number of elements of the set is infinite. To solve this problem, we discretize the infinite plant set by using the relationships between the structures of the plants. The policy that is proper for all the elements of the finite set obtained by the discrete approximations is also proper for all the elements of the original infinite plant set. Using the relationships between the structures of plants and policy, the properness of policy, which is the solution of the relaxation problem for the finite plant set, is revealed. The effectiveness of the proposed method is demonstrated by numerical examples.
View full abstract