ロボティクス・メカトロニクス講演会講演概要集
Online ISSN : 2424-3124
セッションID: 1P1-F21
会議情報

状態行動分布に基づくゴールサンプリングによる自動カリキュラム学習
*山崎 雅史可知 巧巳増山 岳人
著者情報
会議録・要旨集 認証あり

詳細
抄録

It is mostly challenging to implement reinforcement learning due to vast search space. To address this issue, Zhang et al. proposed Value Disagreement Sampling (VDS), which sets pseudo-goals based on degree of disagreement within the multiple value functions. However, the VDS approach may not set contributory pseudo-goals to learn task objectives. In this paper, we aim to enhance the learning efficiency by sampling pseudo-goals based on the state-action distribution sampled from current policy. Simulation results demonstrate the effectiveness of the proposed approach in improving learning efficiency, especially during the later stages of the learning process.

著者関連情報
© 2023 一般社団法人 日本機械学会
前の記事 次の記事
feedback
Top