Transactions of the Japanese Society for Artificial Intelligence
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
Original Paper
Instance-based Policy Learning by Real-coded Genetic Algorithms and Its Application to Control of Nonholonomic Systems
Atsushi MiyamaeJun SakumaIsao OnoShigenobu Kobayashi
Author information
JOURNAL FREE ACCESS

2009 Volume 24 Issue 1 Pages 104-115

Details
Abstract

The stabilization control of nonholonomic systems have been extensively studied because it is essential for nonholonomic robot control problems. The difficulty in this problem is that the theoretical derivation of control policy is not necessarily guaranteed achievable. In this paper, we present a reinforcement learning (RL) method with instance-based policy (IBP) representation, in which control policies for this class are optimized with respect to user-defined cost functions. Direct policy search (DPS) is an approach for RL; the policy is represented by parametric models and the model parameters are directly searched by optimization techniques including genetic algorithms (GAs). In IBP representation an instance consists of a state and an action pair; a policy consists of a set of instances. Several DPSs with IBP have been previously proposed. In these methods, sometimes fail to obtain optimal control policies when state-action variables are continuous. In this paper, we present a real-coded GA for DPSs with IBP. Our method is specifically designed for continuous domains.

Optimization of IBP has three difficulties; high-dimensionality, epistasis, and multi-modality. Our solution is designed for overcoming these difficulties. The policy search with IBP representation appears to be high-dimensional optimization; however, instances which can improve the fitness are often limited to active instances (instances used for the evaluation). In fact, the number of active instances is small. Therefore, we treat the search problem as a low dimensional problem by restricting search variables only to active instances. It has been commonly known that functions with epistasis can be efficiently optimized with crossovers which satisfy the inheritance of statistics. For efficient search of IBP, we propose extended crossover-like mutation (extended XLM) which generates a new instance around an instance with satisfying the inheritance of statistics. For overcoming multi-modality, we propose extended CCM for selection. Extended CCM always chooses the child for next generation among children and a parent which generates the children. By doing so, the diversity of the population is expected to be well maintained. Our proposals, FLIP (Functionally sophisticated Learner for IBP), consist of extended XLM and extended CCM. The effectiveness of FLIP is shown by experiments with nonholonomic control problems, a space robot, a car-like robot, and a parallel-type double inverted pendulum.

Content from these authors
© 2009 JSAI (The Japanese Society for Artificial Intelligence)
Previous article Next article
feedback
Top