状態行動分布に基づくゴールサンプリングによる自動カリキュラム学習

山崎 雅史; 可知 巧巳; 増山 岳人

doi:10.1299/jsmermd.2023.1P1-F21

抄録

It is mostly challenging to implement reinforcement learning due to vast search space. To address this issue, Zhang et al. proposed Value Disagreement Sampling (VDS), which sets pseudo-goals based on degree of disagreement within the multiple value functions. However, the VDS approach may not set contributory pseudo-goals to learn task objectives. In this paper, we aim to enhance the learning efficiency by sampling pseudo-goals based on the state-action distribution sampled from current policy. Simulation results demonstrate the effectiveness of the proposed approach in improving learning efficiency, especially during the later stages of the learning process.

著者関連情報

お気に入り & アラート

お気に入りに追加
追加情報アラート
被引用アラート
認証解除アラート

閲覧履歴

2 cases of HLA-B27-positive seronegative spondylarthritides in pediatric age treated with adalimumab
REVISED INTERNATIONAL STANDARD ISO 3010 BASES FOR DESIGN OF STRUCTURES -SEISMIC ACTIONS ON STRUCTURES
168 残留磁化測定による超硬工具表面の切削応力分布の評価(OS4-2モノ作りと生産システム2)
Inhibition of return in visual search
Complete-Single-Mode Operation of 180Mode Phased Array Lasers by Improving Temperature Distribution Inside the Stripe Region

発行機関からのお知らせ

会員向け購読者番号とパスワードは以下URLよりご確認下さい。
https://www.jsme.or.jp/publication/proceedings/

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）