Hybrid Policy Gradient for Deep Reinforcement Learning

Praveen singh THAKUR; Masaru SOGABE; Katsuyoshi SAKAMOTO; Koichi YAMAGUCHI; Dinesh Bahadur MALLA; Shinji YOKOGAWA; Tomah SOGABE

doi:10.11517/pjsai.JSAI2018.0_3Pin130

32nd (2018)

セッションID: 3Pin1-30

DOI https://doi.org/10.11517/pjsai.JSAI2018.0_3Pin130

会議情報

主催: The Japanese Society for Artificial Intelligence

会議名: 2018年度人工知能学会全国大会（第32回）

回次: 32

開催地: 鹿児島県鹿児島市城山ホテル鹿児島

開催日: 2018/06/05 - 2018/06/08

Hybrid Policy Gradient for Deep Reinforcement Learning

*Praveen singh THAKUR, Masaru SOGABE, Katsuyoshi SAKAMOTO, Koichi YAMAGUCHI, Dinesh Bahadur MALLA, Shinji YOKOGAWA, Tomah SOGABE

著者情報

会議録・要旨集フリー

詳細

抄録

In this paper, for stable learning and faster convergence in Reinforcement learning continuous action tasks, we propose an alternative way of updating the actor (policy) in Deep Deterministic Policy Gradient (DDPG) algorithm. In our proposed Hybrid-DDPG (shortly H-DDPG), at one time step actor is updated similar to DDPG and another time step, policy parameters are moved based on TD-error of critic. Once among 5 trial runs on RoboschoolInvertedPendulumSwingup-v1 environment, reward obtained at the early stage of training in H-DDPG is higher than DDPG. In Hybrid update, the policy gradients are weighted by TD-error. This results in 1) higher reward than DDPG 2) pushes the policy parameters to move in a direction such that the actions with higher reward likely to occur more than the other. This implies if the policy explores at early stages good rewards, the policy may converge quickly otherwise vice versa. However, among the remaining trial runs, H-DDPG performed same as DDPG.

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）