決定論的方策を学習するためのモデルベース強化学習

内部 英治

doi:10.1299/jsmermd.2022.2P1-B09

抄録

Reinforcement Learning (RL) is a trial and error process where a robot interacts with its environment using a stochastic policy. For example, it is realized by adding Gaussian noise to a deterministic policy. Therefore, a straightforward application of RL to robot control tasks is often problematic because the stochastic policy does not produce smooth behaviors. We propose model-based reinforcement learning for learning a deterministic policy to overcome this issue. First, we formulate the RL algorithm with entropy regularization of the model. In this formulation, the robot explores the environment based on the simulated environmental uncertainty. We utilize the stochastic value gradient method for this formulation. Then, we derive a model learning algorithm inspired by density ratio estimation. Our proposed method is evaluated on three benchmark tasks provided by the DeepMind Control Suite, and the experimental results show that our method can produce smooth behaviors and outperform the other baselines.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

会員向け購読者番号とパスワードは以下URLよりご確認下さい。
https://www.jsme.or.jp/publication/proceedings/

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）