状態行動分布に基づくゴールサンプリングによる自動カリキュラム学習

山崎 雅史; 可知 巧巳; 増山 岳人

doi:10.1299/jsmermd.2023.1P1-F21

Abstract

It is mostly challenging to implement reinforcement learning due to vast search space. To address this issue, Zhang et al. proposed Value Disagreement Sampling (VDS), which sets pseudo-goals based on degree of disagreement within the multiple value functions. However, the VDS approach may not set contributory pseudo-goals to learn task objectives. In this paper, we aim to enhance the learning efficiency by sampling pseudo-goals based on the state-action distribution sampled from current policy. Simulation results demonstrate the effectiveness of the proposed approach in improving learning efficiency, especially during the later stages of the learning process.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!