進化計算学会論文誌

論文

多峰性景観下での直接政策探索：Population-based Policy Gradient法の提案

宮前惇, 永田裕一, 小野功, 小林重信

2011 年 2 巻 1 号 p. 1-11
発行日: 2011/10/14
公開日: 2011/10/14

DOIhttps://doi.org/10.11394/tjpnsec.2.1

ジャーナルフリー

抄録を表示する抄録を非表示にする

Policy gradient methods for reinforcement learning can easily be applied to some of the undesirable classes of the value function approaches, such as POMDP environments. However, policy gradient methods do not always learn optimal policy when the task has more than one local optimum solution. In this paper, we propose a new algorithm based on policy gradient methods which learn optimal policy more robustly than traditional policy gradient methods. Our method which is a multi point search method has policy parameter sets. Improving from various policy parameters can increase the probability of learning optimal policy. And, importance sampling techniques enable the policy gradient methods to estimate the gradient for all parameter sets from single pass experiences. Further, we introduce an additional policy which selects best parameter from policy parameter sets. The additional policy is improved to maximize the agent performance by policy gradient methods. We develop the algorithm using these techniques, and demonstrate through simulations of a binary tree task and a locomotion task on a crawling robot.

抄録全体を表示

PDF形式でダウンロード (990K)
Adaptive Mutation in SARSA Learning of Genetic Network Programming with Individual Reconstruction

Fengming Ye, Shingo Mabu, Kotaro Hirasawa

2011 年 2 巻 1 号 p. 12-28
発行日: 2011/10/14
公開日: 2011/10/14

DOIhttps://doi.org/10.11394/tjpnsec.2.12

ジャーナルフリー

抄録を表示する抄録を非表示にする

Many classical methods such as Genetic Algorithm (GA), Genetic Programming (GP), Evolutionary Strategies (ES), etc. have made significant contribution to the study of evolutionary computation. And recently, a new approach named Genetic Network Programming (GNP) has been proposed for especially solving complex problems in dynamic environments. It is based on the algorithms of classical evolutionary computation techniques and uses the data structure of directed graphs which is the unique feature of GNP. Focusing on GNP's distinguished expression ability of the graph structure, our previous research proposed an approach named GNP with Reconstructed Individuals (GNP-RI) which analyzes the structures of the elite GNP individuals and extracts important information to reconstruct new individuals. Based on the GNP-RI, additionally, this paper proposes a more enhanced architecture using SARSA learning to constantly update the utility of the branches, and their mutation rates are adjusted based on the corresponding Q values. The Q values also contribute to the fine tuning of the probabilities of the nodes to which the mutated branches will potentially point. This way, the inappropriate branches could be easily located, then changed to more desirable ones, by which a gradual reduction of genetic weakness could be achieved extensively during the evolutionary process of GNP. In the enhanced architecture named Genetic Network Programming with reconstructed individuals and SARSA learning-based adaptive mutation(GNP-RISLAM), we incorporate the individual reconstruction, which reconstructs and enhances the worst individuals by using the elite information and the adaptive mutation guided by the SARSA learning. In this paper, the proposed architecture has been applied to the tile-world which is an excellent bench mark for evaluating the evolutionary computation architecture.

抄録全体を表示

PDF形式でダウンロード (4671K)
Probabilistic Model Building Genetic Network Programming Using Reinforcement Learning

Xianneng Li, Shingo Mabu, Bing Li, Kotaro Hirasawa

2011 年 2 巻 1 号 p. 29-40
発行日: 2011/10/14
公開日: 2011/10/14

DOIhttps://doi.org/10.11394/tjpnsec.2.29

ジャーナルフリー

抄録を表示する抄録を非表示にする

This paper proposed a novel Estimation of Distribution Algorithm (EDA), where a directed graph network is used to represent its chromosome. In the proposed algorithm, a probabilistic model is constructed from the promising individuals of the current generation using reinforcement learning, and used to produce the new population. The node connection probability is studied to develop the probabilistic model, therefore pairwise interactions can be demonstrated to identify and recombine building blocks in the proposed algorithm. The proposed algorithm is applied to a problem of agent control, i.e., mobile robot control. The experimental results show the superiority of the proposed algorithm over conventional algorithms by comparing the quality and generalization ability of the solutions.

抄録全体を表示

PDF形式でダウンロード (500K)
関係クリーク推定を用いた確率モデル遺伝的プログラミング

丹治信, 伊庭斉志

2011 年 2 巻 1 号 p. 41-52
発行日: 2011/10/14
公開日: 2011/10/14

DOIhttps://doi.org/10.11394/tjpnsec.2.41

ジャーナルフリー

抄録を表示する抄録を非表示にする

PMBGP (Probabilistic Model Building Genetic Programming) is an extension of GP (Genetic Programming) which uses a probabilistic model to estimate good building blocks instead of conventional crossover and mutation operators. In this paper, we present a novel PMBGP technique for program evolution. Our method uses a set of subgraphs of PPT (Probabilistic Prototype Tree) called "related cliques" to capture dependency relations. Moreover, unlike previous methods based on PMBGP, the proposed method does not require to estimate and store any probability tables. The proposed method is named PERCE (Program Evolution using Related Clique Estimation). Our experimental results on benchmark problems of GP and a robot routing problem show the superior performance of PERCE compared to simple GP and other PMBGP methods. Our method requires less than half the number of fitness evaluations taken by simple GP for Deceptive RoyalTree problem and Deceptive Max problem. We discuss the relationship between the probabilistic model used in PERCE and the complexity of the problems.

抄録全体を表示

PDF形式でダウンロード (983K)

J-STAGEへの登録はこちら（無料）