In this paper, we describe the IBP+EC which is a Reinforcement Learning (RL) method combining the Instance-Based Policy (IBP) and optimization by Evolution Computation (EC). The IBP+EC has attracted attention as a promising Direct Policy Search for severe problems that required foreseeing, and switching of control law according to a situation. However, the IBP has strong redundancy due to its policy expression, which makes optimization difficult. The main causes of the redundancy are "dependencies between instances" and "inactive instances". In particular, the "inactive instance" is a peculiar problem of the IBP and it is difficult to handle with a general-purpose optimization algorithm. Besides, the IBP needs to tune the number of instances, which is a parameter that significantly affects its performance. In a general control problem, since the optimal number of instances is rarely found out, its tuning is complicated. In this paper, we propose the IBP-CMA as an optimization method specialized for IBP-Optimization to cope with the above difficulties. The IBP-CMA algorithm is based on the (1+1)-CMA-ES which is a general evolutionary computation method for continuous optimization and is an elitist strategy. By the nature of (1+1)-CMA-ES, the IBP-CMA deals with the "dependencies between instances". Moreover, the IBP-CMA copes with the above-mentioned peculiar problem of the IBP by combining the (1+1)-CMA-ES with the mechanism that activates inactive instances and adapts the number of instances. The experiments in RL tasks show that the IBP-CMA performs well without parameter pre-adjustment.
抄録全体を表示